Day 32: Multinomial Logistic Regression in SPSS – Predicting Unordered Categorical Outcomes

Day 32: Multinomial Logistic Regression in SPSS – Predicting Unordered Categorical Outcomes

Welcome to Day 32 of your 50-day SPSS learning journey! Today, we’ll explore Multinomial Logistic Regression (MLR), a technique for modeling outcomes with more than two unordered categories. Unlike Ordinal Logistic Regression, which assumes a ranking among categories, MLR is used when the categories have no natural order.


What is Multinomial Logistic Regression?

Multinomial Logistic Regression predicts the probability of an observation falling into one of several nominal (unordered) categories based on independent variables.

For example:

  • Predicting political party affiliation (Democrat, Republican, Independent) based on income, education, and age.
  • Identifying preferred mode of transport (Car, Bus, Train, Bike) based on distance, cost, and convenience.

When to Use Multinomial Logistic Regression?

Use Multinomial Logistic Regression when:

  1. The dependent variable is categorical with three or more unordered groups.
  2. The independent variables are continuous, categorical, or both.
  3. The assumption of proportional odds is violated, making Ordinal Logistic Regression unsuitable.

Key Assumptions of Multinomial Logistic Regression

  1. Independence of Irrelevant Alternatives (IIA): The probability of choosing one category over another should not be affected by additional choices.
  2. No Perfect Multicollinearity: Independent variables should not be highly correlated.
  3. Independence of Observations: Each case should be independent.

How to Perform Multinomial Logistic Regression in SPSS

Step 1: Open Your Dataset

For this example, use the following dataset:

ID Income Education Age Political_Party (1=Democrat, 2=Republican, 3=Independent)
1 40000 16 30 1
2 50000 18 45 2
3 45000 16 35 3
4 60000 20 50 2
5 35000 14 28 1
6 55000 19 40 3
  • Political_Party: Dependent variable (unordered: 1 = Democrat, 2 = Republican, 3 = Independent).
  • Income, Education, Age: Independent variables.

Step 2: Access the Multinomial Logistic Regression Tool

  1. Go to Analyze > Regression > Multinomial Logistic.
  2. A dialog box will appear.

Step 3: Define Variables

  1. Move Political_Party to the Dependent box.
  2. Move Income, Education, and Age to the Factor(s)/Covariates box.
  3. Click Reference Category, and choose either First (e.g., Democrat) or Last (e.g., Independent) as the baseline category.

Step 4: Customize Options

  1. Click Statistics:
    • Check Parameter Estimates to view odds ratios.
    • Check Goodness-of-Fit Tests (Pearson, Deviance).
  2. Click Options:
    • Select Classify cases to see model prediction accuracy.
    • Click Continue.

Step 5: Run the Analysis

Click OK to generate the output.


Interpreting the Output

1. Model Fitting Information

  • Compares the final model with the null model (intercept-only):
    • If p < 0.05, the model significantly improves prediction.

2. Goodness-of-Fit Tests

  • Pearson and Deviance Tests: Assess how well the model fits the data.
    • If p > 0.05, the model fits well.

3. Classification Table

  • Shows how well the model predicts each category.
    • Example: 75% of cases were correctly classified.

4. Parameter Estimates (Odds Ratios, Exp(B))

  • Displays logit coefficients (B) and odds ratios (Exp(B)) for each category compared to the reference category.

Example output (baseline = Democrat):

Predictor Republican (B) Exp(B) Independent (B) Exp(B) p-value
Income 0.0002 1.0002 0.0003 1.0003 0.02
Education 0.15 1.16 -0.05 0.95 0.05
Age 0.08 1.08 0.12 1.12 0.01

Interpretation:

  • Income (Exp(B) = 1.0003, p = 0.02): A $1,000 increase in income increases the odds of being Independent over Democrat by 0.03%.
  • Education (Exp(B) = 1.16, p = 0.05): Each additional year of education increases the odds of being Republican over Democrat by 16%.
  • Age (Exp(B) = 1.12, p = 0.01): Older individuals are more likely to be Independent over Democrat.

Practice Example: Perform Multinomial Logistic Regression

Use the following dataset:

ID Distance Cost Convenience Transport_Mode (1=Car, 2=Bus, 3=Train, 4=Bike)
1 5 20 8 1
2 15 5 5 2
3 10 10 6 3
4 3 25 9 1
5 20 3 4 4
  1. Perform Multinomial Logistic Regression with Transport_Mode as the dependent variable and Distance, Cost, and Convenience as predictors.
  2. Interpret the odds ratios (Exp(B)) to determine which factors influence transport mode choice.
  3. Evaluate model fit using Goodness-of-Fit Tests and the Classification Table.

Common Mistakes to Avoid

  1. Choosing the Wrong Model: If the dependent variable is ordered, use Ordinal Logistic Regression instead.
  2. Ignoring Multicollinearity: Check correlations between predictors to avoid redundancy.
  3. Overinterpreting Small Odds Ratios: Exp(B) values close to 1 suggest minimal effect.

Key Takeaways

  • Multinomial Logistic Regression predicts categorical outcomes with more than two unordered categories.
  • Odds Ratios (Exp(B)) indicate how each predictor affects the likelihood of choosing one category over another.
  • Model Fit Tests (Pearson, Deviance, Classification Table) assess how well the model performs.

What’s Next?

In Day 33 of your 50-day SPSS learning journey, we’ll explore Principal Component Analysis (PCA) in SPSS. You’ll learn how to reduce dimensionality while retaining the most important information in your dataset. Stay tuned for another essential multivariate technique!