Day 19: Logistic Regression in SPSS – Predicting Binary Outcomes

Day 19: Logistic Regression in SPSS – Predicting Binary Outcomes

Welcome to Day 19 of your 50-day SPSS learning journey! Today, we’ll explore Logistic Regression, a statistical method used to predict binary outcomes (e.g., yes/no, success/failure, purchase/no purchase). Logistic regression is a powerful tool for understanding relationships between a dependent variable and one or more independent variables.


What is Logistic Regression?

Logistic Regression predicts the probability of an event occurring by modeling the relationship between a binary dependent variable and one or more independent variables.

For example:

  • Predicting whether a customer will make a purchase (yes = 1, no = 0) based on age, income, and previous purchases.

The model uses a logistic function to estimate probabilities:

P(Y = 1) = 1 / [1 + e^-(a + bX)]

Where:

  • P(Y = 1) = Probability of the event occurring.
  • a = Intercept.
  • b = Coefficient of the predictor (X).
  • e = Base of natural logarithms (~2.718).

When to Use Logistic Regression?

Use logistic regression when:

  1. The dependent variable is binary (e.g., yes/no, 0/1).
  2. The independent variables can be continuous, categorical, or both.
  3. You want to predict probabilities or classify cases into groups.

How to Perform Logistic Regression in SPSS

Step 1: Open Your Dataset

For this example, use the following dataset:

ID Age Income Purchased
1 25 30000 0
2 35 40000 1
3 45 50000 1
4 30 35000 0
5 50 60000 1
  • Purchased: Dependent variable (binary: 1 = Yes, 0 = No).
  • Age and Income: Independent variables.

Step 2: Access the Logistic Regression Tool

  1. Go to Analyze > Regression > Binary Logistic.
  2. A dialog box will appear.

Step 3: Select Variables

  1. Move the dependent variable (Purchased) to the Dependent box.
  2. Move the independent variables (Age and Income) to the Covariates box.

Step 4: Customize Options

  1. Click Categorical if any independent variable is categorical (e.g., gender or region).
  2. Click Options and check:
    • Hosmer-Lemeshow goodness-of-fit test to assess model fit.
    • Classification table to evaluate prediction accuracy.
  3. Click OK to run the analysis.

Interpreting the Output

The SPSS output includes several key sections:

1. Variables in the Equation Table

  • Coefficients (B): Show the impact of each predictor on the log-odds of the event.

    • Example: A coefficient of 0.03 for Income means that for every $1 increase in income, the log-odds of purchasing increase by 0.03.
  • Odds Ratios (Exp(B)): Exponentiate the coefficients to interpret as odds ratios.

    • Example: If Exp(B) = 1.03 for Income, a $1 increase in income increases the odds of purchasing by 3%.

2. Model Summary Table

  • -2 Log-Likelihood: A measure of model fit (lower is better).
  • Cox & Snell R-Square and Nagelkerke R-Square: Indicate how much variation in the dependent variable is explained by the model (similar to R² in linear regression).

3. Classification Table

  • Shows the percentage of cases correctly predicted by the model (e.g., 80% accuracy).

4. Hosmer-Lemeshow Test

  • Assesses model fit:
    • If p > 0.05, the model fits the data well.

Practice Example: Logistic Regression in SPSS

Use the following dataset:

ID Hours_Studied Attendance Passed
1 2 60 0
2 4 70 1
3 6 80 1
4 3 65 0
5 8 90 1
  1. Perform a logistic regression with Passed (binary: 1 = Yes, 0 = No) as the dependent variable.
  2. Use Hours_Studied and Attendance as predictors.
  3. Interpret the coefficients, odds ratios, and overall model fit.

Common Mistakes to Avoid

  1. Not Checking Model Fit: Always review the Hosmer-Lemeshow test and classification table.
  2. Misinterpreting Odds Ratios: Remember, odds ratios represent multiplicative changes in odds, not probabilities.
  3. Including Irrelevant Predictors: Adding unnecessary predictors can reduce model performance.

Key Takeaways

  • Logistic regression predicts binary outcomes and estimates the odds of an event occurring.
  • Coefficients (B) and odds ratios (Exp(B)) help interpret the impact of predictors.
  • Always assess model fit and prediction accuracy before drawing conclusions.

What’s Next?

In Day 20 of your 50-day SPSS learning journey, we’ll explore Two-Way ANOVA in SPSS. You’ll learn how to analyze the interaction effects between two independent variables on a dependent variable. Stay tuned for another essential statistical technique!