Henry's EdTech: Day 39: Discriminant Analysis in SPSS

Day 39: Discriminant Analysis in SPSS – Predicting Group Membership

Welcome to Day 39 of your 50-day SPSS learning journey! Today, we’ll explore Discriminant Analysis, a powerful technique for classifying cases into predefined groups based on multiple independent variables. This method is widely used in marketing, finance, healthcare, and social sciences.

What is Discriminant Analysis?

Discriminant Analysis predicts which category an observation belongs to based on a set of predictor variables. It finds a discriminant function that maximizes the differences between groups while minimizing within-group variation.

For example:

Marketing: Classifying customers as low, medium, or high-value based on income, spending, and engagement.
Education: Predicting whether students will pass or fail based on attendance, study hours, and past performance.
Healthcare: Categorizing patients into high-risk or low-risk groups based on health indicators.

Types of Discriminant Analysis

Linear Discriminant Analysis (LDA): Used when groups have equal variance.
Quadratic Discriminant Analysis (QDA): Used when groups have unequal variance.
Stepwise Discriminant Analysis: Selects the most significant predictor variables.

When to Use Discriminant Analysis?

Use Discriminant Analysis when:
✔ You have a categorical dependent variable (e.g., pass/fail, customer segments).
✔ Your independent variables are continuous (e.g., age, income, scores).
✔ You want to predict group membership based on predictor variables.

How to Perform Discriminant Analysis in SPSS

Step 1: Open Your Dataset

For this example, use the following dataset:

ID	Income	Spending_Score	Age	Customer_Type
1	30000	70	25	Low Value
2	50000	80	30	High Value
3	40000	75	28	Medium Value
4	60000	85	35	High Value
5	35000	65	22	Low Value
6	45000	78	32	Medium Value

Customer_Type: Dependent variable (categorical: Low, Medium, High).
Income, Spending_Score, Age: Predictor variables (continuous).

Step 2: Access the Discriminant Analysis Tool

Go to Analyze > Classify > Discriminant.
A dialog box will appear.

Step 3: Define Variables

Move Customer_Type to the Grouping Variable box.
- Click Define Range and specify group values (e.g., 1 = Low, 2 = Medium, 3 = High).
Move Income, Spending_Score, Age to the Independents box.

Step 4: Customize Options

Click Statistics:
- Check Means (to compare group means).
- Check Classification Results (to see prediction accuracy).
Click Classify:
- Select Compute classification statistics.
- Check Summary table and Within-groups correlations.
Click Continue, then OK.

Interpreting the Output

1. Group Statistics Table

Displays the mean and standard deviation of each predictor for each group.
- Example: High-value customers may have higher income and spending scores.

2. Tests of Equality of Group Means

Determines whether each predictor significantly differentiates groups.
- If p < 0.05, the predictor contributes to group separation.

3. Discriminant Function Coefficients

Shows weights of each predictor in the discriminant function.
- Higher coefficients indicate stronger predictors.

4. Classification Results

Displays the percentage of correctly classified cases.
- Example: 85% of cases correctly classified into their respective groups.

5. Canonical Discriminant Functions

Eigenvalues: Measure the strength of the discriminant function.
Wilks’ Lambda: Tests the overall significance of the model (p < 0.05 is good).

Example Interpretation

Suppose you run the analysis and get the following results:

Tests of Equality of Group Means:
- Income: p = 0.01 (significant).
- Spending_Score: p = 0.03 (significant).
- Age: p = 0.08 (not significant).
Interpretation: Income and Spending Score significantly predict customer type, but Age does not.
Classification Results:
- 88% of cases were correctly classified into their groups.
Discriminant Function Coefficients:
- Income: 0.75.
- Spending_Score: 0.65.
- Age: 0.15.
Interpretation: Income is the strongest predictor, followed by Spending Score.

Practice Example: Perform Discriminant Analysis

Use the following dataset:

ID	Study_Hours	Test_Score	Attendance	Result
1	5	60	70	Fail
2	10	85	90	Pass
3	8	75	85	Pass
4	4	55	65	Fail
5	12	90	95	Pass

Perform a Discriminant Analysis with Result (Pass/Fail) as the dependent variable and Study_Hours, Test_Score, and Attendance as predictors.
Interpret the classification accuracy and identify the strongest predictor.

Common Mistakes to Avoid

Including Weak Predictors: Use only variables with significant group differences.
Ignoring Assumptions: Check for normality and homogeneity of variance before running the analysis.
Overfitting: Ensure the model generalizes well by validating with new data.

Key Takeaways

✔ Discriminant Analysis is a powerful tool for predicting group membership.
✔ Wilks’ Lambda and Eigenvalues measure model strength.
✔ Classification Accuracy helps evaluate model effectiveness.

What’s Next?

In Day 40, we’ll explore Survival Analysis in SPSS, a technique for analyzing time-to-event data (e.g., customer churn, medical survival rates). Stay tuned for more advanced statistical techniques! 🚀