Henry's EdTech: Day 44: Canonical Correlation Analysis (CCA) in SPSS – Examining Relationships Between Two Variable Sets

Day 44: Canonical Correlation Analysis (CCA) in SPSS – Examining Relationships Between Two Variable Sets

Welcome to Day 44 of your 50-day SPSS learning journey! Today, we’ll explore Canonical Correlation Analysis (CCA), an advanced multivariate technique used to examine relationships between two sets of variables. This method is widely used in psychology, finance, education, and marketing.

What is Canonical Correlation Analysis (CCA)?

Canonical Correlation Analysis (CCA) identifies relationships between two sets of continuous variables by finding canonical variates—linear combinations that are maximally correlated.

For example:
✔ Education Research: Examining how study habits (Set 1: study time, attendance, note-taking) relate to academic performance (Set 2: test scores, GPA, assignments).
✔ Marketing: Understanding how customer demographics (Set 1: age, income, location) influence shopping behavior (Set 2: spending, purchase frequency, brand preference).
✔ Health Science: Investigating how lifestyle factors (Set 1: diet, exercise, sleep) impact health outcomes (Set 2: BMI, cholesterol, blood pressure).

When to Use Canonical Correlation Analysis?

Use Canonical Correlation Analysis (CCA) when:
✔ You have two sets of continuous variables and want to explore their relationship.
✔ You need to identify underlying patterns linking the two variable sets.
✔ Multiple dependent and independent variables exist without a clear cause-effect relationship.

How to Perform Canonical Correlation Analysis in SPSS

Step 1: Open Your Dataset

For this example, use the following employee productivity dataset:

ID	Training_Hours	Experience	Motivation	Job_Satisfaction	Performance	Productivity
1	10	2	8	7	85	80
2	15	3	7	6	78	75
3	20	5	9	8	90	88
4	12	4	6	5	72	70
5	18	6	8	7	88	85

Set 1 (Predictor Variables): Training_Hours, Experience, Motivation.
Set 2 (Outcome Variables): Job_Satisfaction, Performance, Productivity.

Step 2: Access the Canonical Correlation Tool in SPSS

Go to Analyze > General Linear Model > Multivariate.
Move Training_Hours, Experience, Motivation to the Covariates box.
Move Job_Satisfaction, Performance, Productivity to the Dependent Variables box.
Click Options, then select:
- Estimates of Effect Size
- Residuals

Step 3: Run the Canonical Correlation Using Syntax

SPSS does not have a built-in CCA function in the GUI, but it can be done using syntax:

Open the Syntax Editor (File > New > Syntax).
Paste the following syntax:

CORRELATIONS  
  VARIABLES=Training_Hours Experience Motivation Job_Satisfaction Performance Productivity  
  /PRINT=CORRELATION.

Click Run to generate the correlation matrix, which is the foundation for Canonical Correlation Analysis.

To run a full Canonical Correlation Analysis, you can use Python or R extensions within SPSS.

Interpreting the Canonical Correlation Output

1. Canonical Correlations

Shows the strength of relationships between the two sets of variables.
Example output:
- First Canonical Correlation = 0.85 (strong relationship).
- Second Canonical Correlation = 0.45 (weak relationship).

2. Wilks’ Lambda

Tests the significance of the canonical correlations.
p < 0.05 means at least one pair of canonical variates is significantly related.

3. Canonical Loadings

Correlations between original variables and their canonical variates.

Example output:

Variable	Loading on Canonical Variate 1
Training_Hours	0.80
Experience	0.75
Motivation	0.85
Job_Satisfaction	0.70
Performance	0.88
Productivity	0.83

Interpretation:

Training Hours and Motivation are strongly associated with Job Satisfaction, Performance, and Productivity.
Experience contributes slightly less to the relationship.

4. Redundancy Index

Measures how much variance in one set is explained by the other.
A higher redundancy index suggests a stronger association.

Example Interpretation

Suppose the first canonical correlation is 0.85 (p < 0.01):
✔ Training_Hours, Experience, and Motivation significantly influence Job_Satisfaction, Performance, and Productivity.
✔ Motivation (0.85 loading) has the strongest effect on the outcome variables.

Thus, companies should focus on employee motivation programs to improve job satisfaction and productivity.

Practice Example: Perform CCA on Marketing Data

Use the following dataset of customer demographics and buying behavior:

ID	Age	Income	Education	Purchase_Frequency	Spending_Amount	Loyalty
1	25	40000	16	10	500	80
2	40	50000	18	8	450	75
3	30	45000	16	12	550	85

Perform Canonical Correlation Analysis with:
- Set 1: Age, Income, Education (Demographics).
- Set 2: Purchase_Frequency, Spending_Amount, Loyalty (Buying Behavior).
Interpret the canonical correlations and loadings to identify key influences.

Common Mistakes to Avoid

Ignoring Multicollinearity: Ensure variables within each set are not highly correlated.
Overinterpreting Weak Canonical Correlations: Focus on the first one or two canonical correlations.
Skipping Significance Testing: Always check Wilks’ Lambda and p-values before drawing conclusions.

Key Takeaways

✔ Canonical Correlation Analysis (CCA) examines relationships between two sets of variables.
✔ Canonical Loadings identify which variables contribute most to the relationship.
✔ Wilks’ Lambda and Redundancy Index measure the significance and strength of associations.

What’s Next?

In Day 45, we’ll explore Structural Equation Modeling (SEM) in SPSS, a powerful extension of CCA that allows for testing complex causal relationships between multiple variables. Stay tuned! 🚀