Henry's EdTech: Day 24: Discriminant Analysis in SPSS

Day 24: Discriminant Analysis in SPSS – Classifying Cases into Groups

Welcome to Day 24 of your 50-day SPSS learning journey! Today, we’ll explore Discriminant Analysis, a powerful statistical technique used to classify cases into groups based on predictor variables. This method is widely applied in marketing, medicine, and psychology for tasks like customer segmentation, diagnosing diseases, and group comparisons.

What is Discriminant Analysis?

Discriminant Analysis predicts group membership based on a set of continuous predictors. It creates a discriminant function that maximizes the differences between groups while minimizing variability within groups.

For example:

Classifying customers as high-value or low-value based on their income, spending habits, and age.
Predicting whether students will pass or fail based on attendance and study hours.

Key Outputs of Discriminant Analysis

Discriminant Function: A linear combination of predictors that best separates the groups.
Classification Accuracy: The percentage of correctly classified cases.
Group Centroids: The average value of the discriminant function for each group, indicating how distinct the groups are.

When to Use Discriminant Analysis?

Use Discriminant Analysis when:

The dependent variable is categorical (e.g., group membership).
The independent variables are continuous.
You want to classify cases into predefined groups or understand the factors that discriminate between groups.

Assumptions of Discriminant Analysis

Normality: Predictors are normally distributed within each group.
Homogeneity of Variance-Covariance: The variance-covariance matrices of the predictors are equal across groups.
Independence: Observations are independent of each other.

How to Perform Discriminant Analysis in SPSS

Step 1: Open Your Dataset

For this example, use the following dataset:

ID	Income	Spending_Score	Age	Group
1	30000	70	25	Low Value
2	50000	80	30	High Value
3	40000	75	28	Medium Value
4	60000	85	35	High Value
5	35000	65	22	Low Value
6	45000	78	32	Medium Value

Group: Dependent variable (categorical: Low Value, Medium Value, High Value).
Income, Spending_Score, Age: Predictors (continuous).

Step 2: Access the Discriminant Analysis Tool

Go to Analyze > Classify > Discriminant.
A dialog box will appear.

Step 3: Define Variables

Move the categorical dependent variable (Group) to the Grouping Variable box.
- Click Define Range and specify the group codes (e.g., 1 = Low Value, 2 = Medium Value, 3 = High Value).
Move the continuous independent variables (Income, Spending_Score, Age) to the Independents box.

Step 4: Customize Options

Click Statistics:
- Check Means to display group means for each predictor.
- Check Fisher’s Classification Matrix for classification accuracy.
Click Classify:
- Select Compute classification statistics to evaluate accuracy.
- Check Summary table and Within-groups correlations.
Click Continue, then OK to run the analysis.

Interpreting the Output

1. Group Statistics Table

Displays the mean and standard deviation of each predictor for each group.
- Example: High-value customers may have higher incomes and spending scores than low-value customers.

2. Tests of Equality of Group Means

Tests whether the predictors differ significantly across groups.
- If p < 0.05, the predictor significantly contributes to group separation.

3. Discriminant Function Coefficients

Shows the weights of each predictor in the discriminant function.
- Higher coefficients indicate stronger contributions to group discrimination.

4. Classification Results

Displays the percentage of correctly classified cases.
- Example: 85% of cases are correctly classified into their respective groups.

5. Canonical Discriminant Functions

Displays eigenvalues and Wilks’ Lambda:
- Eigenvalues: Indicate the proportion of variance explained by each discriminant function.
- Wilks’ Lambda: Tests the overall significance of the model (p < 0.05 indicates significance).

Example Interpretation

Suppose you run the Discriminant Analysis and get the following results:

Tests of Equality of Group Means:
- Income: p = 0.01 (significant).
- Spending_Score: p = 0.03 (significant).
- Age: p = 0.08 (not significant).
Interpretation: Income and Spending Score are significant predictors, but Age is not.
Classification Results:
- 88% of cases are correctly classified into their groups.
Discriminant Function Coefficients:
- Income: 0.75.
- Spending_Score: 0.65.
- Age: 0.15.
Interpretation: Income is the strongest predictor of group membership, followed by Spending Score.

Practice Example: Perform Discriminant Analysis

Use the following dataset:

ID	Study_Hours	Test_Score	Attendance	Result
1	5	60	70	Fail
2	10	85	90	Pass
3	8	75	85	Pass
4	4	55	65	Fail
5	12	90	95	Pass
6	6	65	75	Fail

Perform a Discriminant Analysis with Result (Pass/Fail) as the dependent variable and Study_Hours, Test_Score, and Attendance as predictors.
Interpret the classification accuracy and identify the strongest predictor.

Common Mistakes to Avoid

Including Irrelevant Predictors: Use only predictors that are theoretically or statistically meaningful.
Ignoring Assumptions: Test for normality and homogeneity of variance before running the analysis.
Overfitting: Ensure the model is generalizable by validating with new data.

Key Takeaways

Discriminant Analysis is a useful tool for classifying cases into groups and understanding what separates them.
Evaluate predictors using tests of group means and discriminant function coefficients.
Always assess classification accuracy to validate your model.

What’s Next?

In Day 25 of your 50-day SPSS learning journey, we’ll explore Multidimensional Scaling (MDS) in SPSS. You’ll learn how to visualize relationships between objects or cases in a low-dimensional space. Stay tuned for another exciting multivariate technique!