Day 39: Discriminant Analysis in SPSS – Predicting Group Membership

Day 39: Discriminant Analysis in SPSS – Predicting Group Membership

Welcome to Day 39 of your 50-day SPSS learning journey! Today, we’ll explore Discriminant Analysis, a powerful technique for classifying cases into predefined groups based on multiple independent variables. This method is widely used in marketing, finance, healthcare, and social sciences.


What is Discriminant Analysis?

Discriminant Analysis predicts which category an observation belongs to based on a set of predictor variables. It finds a discriminant function that maximizes the differences between groups while minimizing within-group variation.

For example:

  • Marketing: Classifying customers as low, medium, or high-value based on income, spending, and engagement.
  • Education: Predicting whether students will pass or fail based on attendance, study hours, and past performance.
  • Healthcare: Categorizing patients into high-risk or low-risk groups based on health indicators.

Types of Discriminant Analysis

  1. Linear Discriminant Analysis (LDA): Used when groups have equal variance.
  2. Quadratic Discriminant Analysis (QDA): Used when groups have unequal variance.
  3. Stepwise Discriminant Analysis: Selects the most significant predictor variables.

When to Use Discriminant Analysis?

Use Discriminant Analysis when:
✔ You have a categorical dependent variable (e.g., pass/fail, customer segments).
✔ Your independent variables are continuous (e.g., age, income, scores).
✔ You want to predict group membership based on predictor variables.


How to Perform Discriminant Analysis in SPSS

Step 1: Open Your Dataset

For this example, use the following dataset:

ID Income Spending_Score Age Customer_Type
1 30000 70 25 Low Value
2 50000 80 30 High Value
3 40000 75 28 Medium Value
4 60000 85 35 High Value
5 35000 65 22 Low Value
6 45000 78 32 Medium Value
  • Customer_Type: Dependent variable (categorical: Low, Medium, High).
  • Income, Spending_Score, Age: Predictor variables (continuous).

Step 2: Access the Discriminant Analysis Tool

  1. Go to Analyze > Classify > Discriminant.
  2. A dialog box will appear.

Step 3: Define Variables

  1. Move Customer_Type to the Grouping Variable box.
    • Click Define Range and specify group values (e.g., 1 = Low, 2 = Medium, 3 = High).
  2. Move Income, Spending_Score, Age to the Independents box.

Step 4: Customize Options

  1. Click Statistics:
    • Check Means (to compare group means).
    • Check Classification Results (to see prediction accuracy).
  2. Click Classify:
    • Select Compute classification statistics.
    • Check Summary table and Within-groups correlations.
  3. Click Continue, then OK.

Interpreting the Output

1. Group Statistics Table

  • Displays the mean and standard deviation of each predictor for each group.
    • Example: High-value customers may have higher income and spending scores.

2. Tests of Equality of Group Means

  • Determines whether each predictor significantly differentiates groups.
    • If p < 0.05, the predictor contributes to group separation.

3. Discriminant Function Coefficients

  • Shows weights of each predictor in the discriminant function.
    • Higher coefficients indicate stronger predictors.

4. Classification Results

  • Displays the percentage of correctly classified cases.
    • Example: 85% of cases correctly classified into their respective groups.

5. Canonical Discriminant Functions

  • Eigenvalues: Measure the strength of the discriminant function.
  • Wilks’ Lambda: Tests the overall significance of the model (p < 0.05 is good).

Example Interpretation

Suppose you run the analysis and get the following results:

  1. Tests of Equality of Group Means:

    • Income: p = 0.01 (significant).
    • Spending_Score: p = 0.03 (significant).
    • Age: p = 0.08 (not significant).

    Interpretation: Income and Spending Score significantly predict customer type, but Age does not.

  2. Classification Results:

    • 88% of cases were correctly classified into their groups.
  3. Discriminant Function Coefficients:

    • Income: 0.75.
    • Spending_Score: 0.65.
    • Age: 0.15.

    Interpretation: Income is the strongest predictor, followed by Spending Score.


Practice Example: Perform Discriminant Analysis

Use the following dataset:

ID Study_Hours Test_Score Attendance Result
1 5 60 70 Fail
2 10 85 90 Pass
3 8 75 85 Pass
4 4 55 65 Fail
5 12 90 95 Pass
  1. Perform a Discriminant Analysis with Result (Pass/Fail) as the dependent variable and Study_Hours, Test_Score, and Attendance as predictors.
  2. Interpret the classification accuracy and identify the strongest predictor.

Common Mistakes to Avoid

  1. Including Weak Predictors: Use only variables with significant group differences.
  2. Ignoring Assumptions: Check for normality and homogeneity of variance before running the analysis.
  3. Overfitting: Ensure the model generalizes well by validating with new data.

Key Takeaways

Discriminant Analysis is a powerful tool for predicting group membership.
Wilks’ Lambda and Eigenvalues measure model strength.
Classification Accuracy helps evaluate model effectiveness.


What’s Next?

In Day 40, we’ll explore Survival Analysis in SPSS, a technique for analyzing time-to-event data (e.g., customer churn, medical survival rates). Stay tuned for more advanced statistical techniques! 🚀