Henry's EdTech: Day 32: Multinomial Logistic Regression in SPSS – Predicting Unordered Categorical Outcomes

Day 32: Multinomial Logistic Regression in SPSS – Predicting Unordered Categorical Outcomes

Welcome to Day 32 of your 50-day SPSS learning journey! Today, we’ll explore Multinomial Logistic Regression (MLR), a technique for modeling outcomes with more than two unordered categories. Unlike Ordinal Logistic Regression, which assumes a ranking among categories, MLR is used when the categories have no natural order.

What is Multinomial Logistic Regression?

Multinomial Logistic Regression predicts the probability of an observation falling into one of several nominal (unordered) categories based on independent variables.

For example:

Predicting political party affiliation (Democrat, Republican, Independent) based on income, education, and age.
Identifying preferred mode of transport (Car, Bus, Train, Bike) based on distance, cost, and convenience.

When to Use Multinomial Logistic Regression?

Use Multinomial Logistic Regression when:

The dependent variable is categorical with three or more unordered groups.
The independent variables are continuous, categorical, or both.
The assumption of proportional odds is violated, making Ordinal Logistic Regression unsuitable.

Key Assumptions of Multinomial Logistic Regression

Independence of Irrelevant Alternatives (IIA): The probability of choosing one category over another should not be affected by additional choices.
No Perfect Multicollinearity: Independent variables should not be highly correlated.
Independence of Observations: Each case should be independent.

How to Perform Multinomial Logistic Regression in SPSS

Step 1: Open Your Dataset

For this example, use the following dataset:

ID	Income	Education	Age	Political_Party (1=Democrat, 2=Republican, 3=Independent)
1	40000	16	30	1
2	50000	18	45	2
3	45000	16	35	3
4	60000	20	50	2
5	35000	14	28	1
6	55000	19	40	3

Political_Party: Dependent variable (unordered: 1 = Democrat, 2 = Republican, 3 = Independent).
Income, Education, Age: Independent variables.

Step 2: Access the Multinomial Logistic Regression Tool

Go to Analyze > Regression > Multinomial Logistic.
A dialog box will appear.

Step 3: Define Variables

Move Political_Party to the Dependent box.
Move Income, Education, and Age to the Factor(s)/Covariates box.
Click Reference Category, and choose either First (e.g., Democrat) or Last (e.g., Independent) as the baseline category.

Step 4: Customize Options

Click Statistics:
- Check Parameter Estimates to view odds ratios.
- Check Goodness-of-Fit Tests (Pearson, Deviance).
Click Options:
- Select Classify cases to see model prediction accuracy.
- Click Continue.

Step 5: Run the Analysis

Click OK to generate the output.

Interpreting the Output

1. Model Fitting Information

Compares the final model with the null model (intercept-only):
- If p < 0.05, the model significantly improves prediction.

2. Goodness-of-Fit Tests

Pearson and Deviance Tests: Assess how well the model fits the data.
- If p > 0.05, the model fits well.

3. Classification Table

Shows how well the model predicts each category.
- Example: 75% of cases were correctly classified.

4. Parameter Estimates (Odds Ratios, Exp(B))

Displays logit coefficients (B) and odds ratios (Exp(B)) for each category compared to the reference category.

Example output (baseline = Democrat):

Predictor	Republican (B)	Exp(B)	Independent (B)	Exp(B)	p-value
Income	0.0002	1.0002	0.0003	1.0003	0.02
Education	0.15	1.16	-0.05	0.95	0.05
Age	0.08	1.08	0.12	1.12	0.01

Interpretation:

Income (Exp(B) = 1.0003, p = 0.02): A $1,000 increase in income increases the odds of being Independent over Democrat by 0.03%.
Education (Exp(B) = 1.16, p = 0.05): Each additional year of education increases the odds of being Republican over Democrat by 16%.
Age (Exp(B) = 1.12, p = 0.01): Older individuals are more likely to be Independent over Democrat.

Practice Example: Perform Multinomial Logistic Regression

Use the following dataset:

ID	Distance	Cost	Convenience	Transport_Mode (1=Car, 2=Bus, 3=Train, 4=Bike)
1	5	20	8	1
2	15	5	5	2
3	10	10	6	3
4	3	25	9	1
5	20	3	4	4

Perform Multinomial Logistic Regression with Transport_Mode as the dependent variable and Distance, Cost, and Convenience as predictors.
Interpret the odds ratios (Exp(B)) to determine which factors influence transport mode choice.
Evaluate model fit using Goodness-of-Fit Tests and the Classification Table.

Common Mistakes to Avoid

Choosing the Wrong Model: If the dependent variable is ordered, use Ordinal Logistic Regression instead.
Ignoring Multicollinearity: Check correlations between predictors to avoid redundancy.
Overinterpreting Small Odds Ratios: Exp(B) values close to 1 suggest minimal effect.

Key Takeaways

Multinomial Logistic Regression predicts categorical outcomes with more than two unordered categories.
Odds Ratios (Exp(B)) indicate how each predictor affects the likelihood of choosing one category over another.
Model Fit Tests (Pearson, Deviance, Classification Table) assess how well the model performs.

What’s Next?

In Day 33 of your 50-day SPSS learning journey, we’ll explore Principal Component Analysis (PCA) in SPSS. You’ll learn how to reduce dimensionality while retaining the most important information in your dataset. Stay tuned for another essential multivariate technique!