Day 19: Logistic Regression in SPSS – Predicting Binary Outcomes
Welcome to Day 19 of your 50-day SPSS learning journey! Today, we’ll explore Logistic Regression, a statistical method used to predict binary outcomes (e.g., yes/no, success/failure, purchase/no purchase). Logistic regression is a powerful tool for understanding relationships between a dependent variable and one or more independent variables.
What is Logistic Regression?
Logistic Regression predicts the probability of an event occurring by modeling the relationship between a binary dependent variable and one or more independent variables.
For example:
- Predicting whether a customer will make a purchase (yes = 1, no = 0) based on age, income, and previous purchases.
The model uses a logistic function to estimate probabilities:
P(Y = 1) = 1 / [1 + e^-(a + bX)]
Where:
- P(Y = 1) = Probability of the event occurring.
- a = Intercept.
- b = Coefficient of the predictor (X).
- e = Base of natural logarithms (~2.718).
When to Use Logistic Regression?
Use logistic regression when:
- The dependent variable is binary (e.g., yes/no, 0/1).
- The independent variables can be continuous, categorical, or both.
- You want to predict probabilities or classify cases into groups.
How to Perform Logistic Regression in SPSS
Step 1: Open Your Dataset
For this example, use the following dataset:
ID | Age | Income | Purchased |
---|---|---|---|
1 | 25 | 30000 | 0 |
2 | 35 | 40000 | 1 |
3 | 45 | 50000 | 1 |
4 | 30 | 35000 | 0 |
5 | 50 | 60000 | 1 |
- Purchased: Dependent variable (binary: 1 = Yes, 0 = No).
- Age and Income: Independent variables.
Step 2: Access the Logistic Regression Tool
- Go to Analyze > Regression > Binary Logistic.
- A dialog box will appear.
Step 3: Select Variables
- Move the dependent variable (
Purchased
) to the Dependent box. - Move the independent variables (
Age
andIncome
) to the Covariates box.
Step 4: Customize Options
- Click Categorical if any independent variable is categorical (e.g., gender or region).
- Click Options and check:
- Hosmer-Lemeshow goodness-of-fit test to assess model fit.
- Classification table to evaluate prediction accuracy.
- Click OK to run the analysis.
Interpreting the Output
The SPSS output includes several key sections:
1. Variables in the Equation Table
-
Coefficients (B): Show the impact of each predictor on the log-odds of the event.
- Example: A coefficient of 0.03 for
Income
means that for every $1 increase in income, the log-odds of purchasing increase by 0.03.
- Example: A coefficient of 0.03 for
-
Odds Ratios (Exp(B)): Exponentiate the coefficients to interpret as odds ratios.
- Example: If Exp(B) = 1.03 for
Income
, a $1 increase in income increases the odds of purchasing by 3%.
- Example: If Exp(B) = 1.03 for
2. Model Summary Table
- -2 Log-Likelihood: A measure of model fit (lower is better).
- Cox & Snell R-Square and Nagelkerke R-Square: Indicate how much variation in the dependent variable is explained by the model (similar to R² in linear regression).
3. Classification Table
- Shows the percentage of cases correctly predicted by the model (e.g., 80% accuracy).
4. Hosmer-Lemeshow Test
- Assesses model fit:
- If p > 0.05, the model fits the data well.
Practice Example: Logistic Regression in SPSS
Use the following dataset:
ID | Hours_Studied | Attendance | Passed |
---|---|---|---|
1 | 2 | 60 | 0 |
2 | 4 | 70 | 1 |
3 | 6 | 80 | 1 |
4 | 3 | 65 | 0 |
5 | 8 | 90 | 1 |
- Perform a logistic regression with
Passed
(binary: 1 = Yes, 0 = No) as the dependent variable. - Use
Hours_Studied
andAttendance
as predictors. - Interpret the coefficients, odds ratios, and overall model fit.
Common Mistakes to Avoid
- Not Checking Model Fit: Always review the Hosmer-Lemeshow test and classification table.
- Misinterpreting Odds Ratios: Remember, odds ratios represent multiplicative changes in odds, not probabilities.
- Including Irrelevant Predictors: Adding unnecessary predictors can reduce model performance.
Key Takeaways
- Logistic regression predicts binary outcomes and estimates the odds of an event occurring.
- Coefficients (B) and odds ratios (Exp(B)) help interpret the impact of predictors.
- Always assess model fit and prediction accuracy before drawing conclusions.
What’s Next?
In Day 20 of your 50-day SPSS learning journey, we’ll explore Two-Way ANOVA in SPSS. You’ll learn how to analyze the interaction effects between two independent variables on a dependent variable. Stay tuned for another essential statistical technique!