Day 24: Discriminant Analysis in SPSS – Classifying Cases into Groups

Day 24: Discriminant Analysis in SPSS – Classifying Cases into Groups

Welcome to Day 24 of your 50-day SPSS learning journey! Today, we’ll explore Discriminant Analysis, a powerful statistical technique used to classify cases into groups based on predictor variables. This method is widely applied in marketing, medicine, and psychology for tasks like customer segmentation, diagnosing diseases, and group comparisons.


What is Discriminant Analysis?

Discriminant Analysis predicts group membership based on a set of continuous predictors. It creates a discriminant function that maximizes the differences between groups while minimizing variability within groups.

For example:

  • Classifying customers as high-value or low-value based on their income, spending habits, and age.
  • Predicting whether students will pass or fail based on attendance and study hours.

Key Outputs of Discriminant Analysis

  1. Discriminant Function: A linear combination of predictors that best separates the groups.
  2. Classification Accuracy: The percentage of correctly classified cases.
  3. Group Centroids: The average value of the discriminant function for each group, indicating how distinct the groups are.

When to Use Discriminant Analysis?

Use Discriminant Analysis when:

  • The dependent variable is categorical (e.g., group membership).
  • The independent variables are continuous.
  • You want to classify cases into predefined groups or understand the factors that discriminate between groups.

Assumptions of Discriminant Analysis

  1. Normality: Predictors are normally distributed within each group.
  2. Homogeneity of Variance-Covariance: The variance-covariance matrices of the predictors are equal across groups.
  3. Independence: Observations are independent of each other.

How to Perform Discriminant Analysis in SPSS

Step 1: Open Your Dataset

For this example, use the following dataset:

ID Income Spending_Score Age Group
1 30000 70 25 Low Value
2 50000 80 30 High Value
3 40000 75 28 Medium Value
4 60000 85 35 High Value
5 35000 65 22 Low Value
6 45000 78 32 Medium Value
  • Group: Dependent variable (categorical: Low Value, Medium Value, High Value).
  • Income, Spending_Score, Age: Predictors (continuous).

Step 2: Access the Discriminant Analysis Tool

  1. Go to Analyze > Classify > Discriminant.
  2. A dialog box will appear.

Step 3: Define Variables

  1. Move the categorical dependent variable (Group) to the Grouping Variable box.
    • Click Define Range and specify the group codes (e.g., 1 = Low Value, 2 = Medium Value, 3 = High Value).
  2. Move the continuous independent variables (Income, Spending_Score, Age) to the Independents box.

Step 4: Customize Options

  1. Click Statistics:
    • Check Means to display group means for each predictor.
    • Check Fisher’s Classification Matrix for classification accuracy.
  2. Click Classify:
    • Select Compute classification statistics to evaluate accuracy.
    • Check Summary table and Within-groups correlations.
  3. Click Continue, then OK to run the analysis.

Interpreting the Output

1. Group Statistics Table

  • Displays the mean and standard deviation of each predictor for each group.
    • Example: High-value customers may have higher incomes and spending scores than low-value customers.

2. Tests of Equality of Group Means

  • Tests whether the predictors differ significantly across groups.
    • If p < 0.05, the predictor significantly contributes to group separation.

3. Discriminant Function Coefficients

  • Shows the weights of each predictor in the discriminant function.
    • Higher coefficients indicate stronger contributions to group discrimination.

4. Classification Results

  • Displays the percentage of correctly classified cases.
    • Example: 85% of cases are correctly classified into their respective groups.

5. Canonical Discriminant Functions

  • Displays eigenvalues and Wilks’ Lambda:
    • Eigenvalues: Indicate the proportion of variance explained by each discriminant function.
    • Wilks’ Lambda: Tests the overall significance of the model (p < 0.05 indicates significance).

Example Interpretation

Suppose you run the Discriminant Analysis and get the following results:

  1. Tests of Equality of Group Means:

    • Income: p = 0.01 (significant).
    • Spending_Score: p = 0.03 (significant).
    • Age: p = 0.08 (not significant).

    Interpretation: Income and Spending Score are significant predictors, but Age is not.

  2. Classification Results:

    • 88% of cases are correctly classified into their groups.
  3. Discriminant Function Coefficients:

    • Income: 0.75.
    • Spending_Score: 0.65.
    • Age: 0.15.

    Interpretation: Income is the strongest predictor of group membership, followed by Spending Score.


Practice Example: Perform Discriminant Analysis

Use the following dataset:

ID Study_Hours Test_Score Attendance Result
1 5 60 70 Fail
2 10 85 90 Pass
3 8 75 85 Pass
4 4 55 65 Fail
5 12 90 95 Pass
6 6 65 75 Fail
  1. Perform a Discriminant Analysis with Result (Pass/Fail) as the dependent variable and Study_Hours, Test_Score, and Attendance as predictors.
  2. Interpret the classification accuracy and identify the strongest predictor.

Common Mistakes to Avoid

  1. Including Irrelevant Predictors: Use only predictors that are theoretically or statistically meaningful.
  2. Ignoring Assumptions: Test for normality and homogeneity of variance before running the analysis.
  3. Overfitting: Ensure the model is generalizable by validating with new data.

Key Takeaways

  • Discriminant Analysis is a useful tool for classifying cases into groups and understanding what separates them.
  • Evaluate predictors using tests of group means and discriminant function coefficients.
  • Always assess classification accuracy to validate your model.

What’s Next?

In Day 25 of your 50-day SPSS learning journey, we’ll explore Multidimensional Scaling (MDS) in SPSS. You’ll learn how to visualize relationships between objects or cases in a low-dimensional space. Stay tuned for another exciting multivariate technique!