Day 32: Multinomial Logistic Regression in SPSS – Predicting Unordered Categorical Outcomes
Welcome to Day 32 of your 50-day SPSS learning journey! Today, we’ll explore Multinomial Logistic Regression (MLR), a technique for modeling outcomes with more than two unordered categories. Unlike Ordinal Logistic Regression, which assumes a ranking among categories, MLR is used when the categories have no natural order.
What is Multinomial Logistic Regression?
Multinomial Logistic Regression predicts the probability of an observation falling into one of several nominal (unordered) categories based on independent variables.
For example:
- Predicting political party affiliation (Democrat, Republican, Independent) based on income, education, and age.
- Identifying preferred mode of transport (Car, Bus, Train, Bike) based on distance, cost, and convenience.
When to Use Multinomial Logistic Regression?
Use Multinomial Logistic Regression when:
- The dependent variable is categorical with three or more unordered groups.
- The independent variables are continuous, categorical, or both.
- The assumption of proportional odds is violated, making Ordinal Logistic Regression unsuitable.
Key Assumptions of Multinomial Logistic Regression
- Independence of Irrelevant Alternatives (IIA): The probability of choosing one category over another should not be affected by additional choices.
- No Perfect Multicollinearity: Independent variables should not be highly correlated.
- Independence of Observations: Each case should be independent.
How to Perform Multinomial Logistic Regression in SPSS
Step 1: Open Your Dataset
For this example, use the following dataset:
ID | Income | Education | Age | Political_Party (1=Democrat, 2=Republican, 3=Independent) |
---|---|---|---|---|
1 | 40000 | 16 | 30 | 1 |
2 | 50000 | 18 | 45 | 2 |
3 | 45000 | 16 | 35 | 3 |
4 | 60000 | 20 | 50 | 2 |
5 | 35000 | 14 | 28 | 1 |
6 | 55000 | 19 | 40 | 3 |
- Political_Party: Dependent variable (unordered: 1 = Democrat, 2 = Republican, 3 = Independent).
- Income, Education, Age: Independent variables.
Step 2: Access the Multinomial Logistic Regression Tool
- Go to Analyze > Regression > Multinomial Logistic.
- A dialog box will appear.
Step 3: Define Variables
- Move
Political_Party
to the Dependent box. - Move
Income
,Education
, andAge
to the Factor(s)/Covariates box. - Click Reference Category, and choose either First (e.g., Democrat) or Last (e.g., Independent) as the baseline category.
Step 4: Customize Options
- Click Statistics:
- Check Parameter Estimates to view odds ratios.
- Check Goodness-of-Fit Tests (Pearson, Deviance).
- Click Options:
- Select Classify cases to see model prediction accuracy.
- Click Continue.
Step 5: Run the Analysis
Click OK to generate the output.
Interpreting the Output
1. Model Fitting Information
- Compares the final model with the null model (intercept-only):
- If p < 0.05, the model significantly improves prediction.
2. Goodness-of-Fit Tests
- Pearson and Deviance Tests: Assess how well the model fits the data.
- If p > 0.05, the model fits well.
3. Classification Table
- Shows how well the model predicts each category.
- Example: 75% of cases were correctly classified.
4. Parameter Estimates (Odds Ratios, Exp(B))
- Displays logit coefficients (B) and odds ratios (Exp(B)) for each category compared to the reference category.
Example output (baseline = Democrat):
Predictor | Republican (B) | Exp(B) | Independent (B) | Exp(B) | p-value |
---|---|---|---|---|---|
Income | 0.0002 | 1.0002 | 0.0003 | 1.0003 | 0.02 |
Education | 0.15 | 1.16 | -0.05 | 0.95 | 0.05 |
Age | 0.08 | 1.08 | 0.12 | 1.12 | 0.01 |
Interpretation:
- Income (Exp(B) = 1.0003, p = 0.02): A $1,000 increase in income increases the odds of being Independent over Democrat by 0.03%.
- Education (Exp(B) = 1.16, p = 0.05): Each additional year of education increases the odds of being Republican over Democrat by 16%.
- Age (Exp(B) = 1.12, p = 0.01): Older individuals are more likely to be Independent over Democrat.
Practice Example: Perform Multinomial Logistic Regression
Use the following dataset:
ID | Distance | Cost | Convenience | Transport_Mode (1=Car, 2=Bus, 3=Train, 4=Bike) |
---|---|---|---|---|
1 | 5 | 20 | 8 | 1 |
2 | 15 | 5 | 5 | 2 |
3 | 10 | 10 | 6 | 3 |
4 | 3 | 25 | 9 | 1 |
5 | 20 | 3 | 4 | 4 |
- Perform Multinomial Logistic Regression with
Transport_Mode
as the dependent variable andDistance
,Cost
, andConvenience
as predictors. - Interpret the odds ratios (Exp(B)) to determine which factors influence transport mode choice.
- Evaluate model fit using Goodness-of-Fit Tests and the Classification Table.
Common Mistakes to Avoid
- Choosing the Wrong Model: If the dependent variable is ordered, use Ordinal Logistic Regression instead.
- Ignoring Multicollinearity: Check correlations between predictors to avoid redundancy.
- Overinterpreting Small Odds Ratios: Exp(B) values close to 1 suggest minimal effect.
Key Takeaways
- Multinomial Logistic Regression predicts categorical outcomes with more than two unordered categories.
- Odds Ratios (Exp(B)) indicate how each predictor affects the likelihood of choosing one category over another.
- Model Fit Tests (Pearson, Deviance, Classification Table) assess how well the model performs.
What’s Next?
In Day 33 of your 50-day SPSS learning journey, we’ll explore Principal Component Analysis (PCA) in SPSS. You’ll learn how to reduce dimensionality while retaining the most important information in your dataset. Stay tuned for another essential multivariate technique!