Day 33: Principal Component Analysis (PCA) in SPSS – Reducing Dimensionality

Day 33: Principal Component Analysis (PCA) in SPSS – Reducing Dimensionality

Welcome to Day 33 of your 50-day SPSS learning journey! Today, we’ll explore Principal Component Analysis (PCA), a technique used for reducing the number of variables in a dataset while preserving as much information as possible. PCA is widely used in machine learning, psychology, marketing, and other fields that deal with high-dimensional data.


What is Principal Component Analysis (PCA)?

PCA is a technique that transforms a set of correlated variables into a smaller set of uncorrelated components called principal components. These components capture the most important variations in the data.

For example:

  • Reducing 10 survey questions on customer satisfaction to 2 or 3 key dimensions.
  • Condensing financial indicators (income, expenses, debt, assets) into a smaller set of financial health indicators.

PCA helps simplify complex datasets while retaining most of the information.


When to Use PCA?

Use Principal Component Analysis when:

  1. You have a dataset with many correlated variables and want to reduce redundancy.
  2. You need to simplify data visualization by representing high-dimensional data in 2D or 3D.
  3. You’re creating composite scores for factors such as intelligence, brand perception, or economic development.

Key Assumptions of PCA

  1. Linearity: The relationships between variables should be linear.
  2. Adequate Sample Size: A larger sample is better for stable results.
  3. No Perfect Multicollinearity: Variables should not be perfectly correlated (though some correlation is expected).
  4. Kaiser-Meyer-Olkin (KMO) Measure: Tests whether PCA is appropriate (values above 0.6 are ideal).

How to Perform PCA in SPSS

Step 1: Open Your Dataset

For this example, use the following dataset of students’ skills:

ID Math Reading Writing Logic Creativity Problem_Solving
1 85 78 80 90 75 88
2 70 65 68 75 80 72
3 90 85 88 95 70 92
4 65 60 62 70 85 68
5 88 82 85 92 78 90
  • The goal: Reduce these 6 skills into a few key components (e.g., analytical vs. creative skills).

Step 2: Access the PCA Tool

  1. Go to Analyze > Dimension Reduction > Factor.
  2. A dialog box will appear.

Step 3: Define Variables

  1. Move all six skill variables (Math, Reading, Writing, Logic, Creativity, Problem_Solving) to the Variables box.
  2. Click Descriptives, check KMO and Bartlett’s Test to assess suitability.

Step 4: Select Extraction Method

  1. Click Extraction:
    • Select Principal Components as the method.
    • Check Scree Plot to visualize component selection.
    • Set Eigenvalue > 1 to retain meaningful components.
  2. Click Continue.

Step 5: Select Rotation Method

  1. Click Rotation:
    • Choose Varimax (simplifies interpretation by maximizing variance among components).
  2. Click Continue, then OK.

Interpreting the Output

1. KMO and Bartlett’s Test

  • KMO value > 0.6: PCA is appropriate.
  • Bartlett’s Test p < 0.05: Significant correlations exist among variables.

2. Total Variance Explained Table

  • Eigenvalues > 1: These components explain meaningful variance.
  • Example: If two components explain 80% of the variance, then most information is captured by just two factors.

3. Scree Plot

  • A graph showing eigenvalues of components.
  • Look for the elbow point where eigenvalues level off—this is the optimal number of components.

4. Rotated Component Matrix

  • Shows which variables load onto which components.
  • Example output:
Variable Component 1 (Analytical) Component 2 (Creative)
Math 0.85 0.20
Reading 0.80 0.25
Writing 0.75 0.30
Logic 0.88 0.22
Creativity 0.10 0.90
Problem_Solving 0.65 0.50

Interpretation:

  • Component 1 (Analytical Skills): High loadings for Math, Reading, Writing, Logic.
  • Component 2 (Creative Skills): High loadings for Creativity, Problem_Solving.

Practice Example: Perform PCA

Use the following dataset of employee competencies:

ID Communication Leadership Technical_Skills Teamwork Adaptability
1 8 9 7 8 7
2 7 8 9 6 8
3 9 10 6 9 7
4 6 7 8 7 9
5 8 9 7 8 8
  1. Perform PCA to identify key competency factors.
  2. Use KMO and Bartlett’s Test to check suitability.
  3. Interpret the Rotated Component Matrix to define meaningful skill groups.

Common Mistakes to Avoid

  1. Using PCA on Non-Correlated Variables: If variables are uncorrelated, PCA is ineffective.
  2. Retaining Too Many Components: Focus on components with eigenvalues > 1 or use the scree plot to determine the optimal number.
  3. Ignoring Rotation: Varimax rotation improves interpretability by making component loadings clearer.

Key Takeaways

  • Principal Component Analysis (PCA) reduces dimensionality while retaining key information.
  • Eigenvalues and scree plots help determine the optimal number of components.
  • Rotated Component Matrix shows which variables group together into meaningful factors.

What’s Next?

In Day 34 of your 50-day SPSS learning journey, we’ll explore Exploratory Factor Analysis (EFA) in SPSS. You’ll learn how to uncover latent constructs in survey and psychological data. Stay tuned for another powerful statistical technique!