Day 30: Binary Logistic Regression with Multiple Predictors in SPSS – Advanced Predictive Modeling
Welcome to Day 30 of your 50-day SPSS learning journey! Today, we’ll expand on Binary Logistic Regression by incorporating multiple predictors to improve predictive accuracy. This method is widely used in medicine, finance, marketing, and social sciences to model outcomes with two possible categories (e.g., pass/fail, buy/not buy, default/no default).
What is Binary Logistic Regression with Multiple Predictors?
Binary Logistic Regression estimates the probability of an event occurring (e.g., a customer making a purchase) based on multiple independent variables. Unlike linear regression, logistic regression models the probability using the logit function, ensuring predictions remain between 0 and 1.
For example:
- Predicting whether a student will pass or fail based on study hours, attendance, and prior grades.
- Identifying factors influencing loan default (e.g., income, debt-to-income ratio, and credit score).
When to Use Binary Logistic Regression with Multiple Predictors?
Use logistic regression when:
- The dependent variable is binary (e.g., 1 = Yes, 0 = No).
- The independent variables are continuous, categorical, or both.
- You want to analyze how multiple predictors influence the probability of an event.
Key Assumptions of Binary Logistic Regression
- No Perfect Multicollinearity: Predictors should not be highly correlated with each other.
- Linearity in the Logit: The independent variables should be linearly related to the log odds of the dependent variable.
- Independence of Observations: Each case should be independent.
How to Perform Binary Logistic Regression in SPSS
Step 1: Open Your Dataset
For this example, use the following dataset:
ID | Study_Hours | Attendance | Previous_Grade | Passed (1=Yes, 0=No) |
---|---|---|---|---|
1 | 5 | 60 | 70 | 0 |
2 | 10 | 80 | 85 | 1 |
3 | 7 | 75 | 78 | 1 |
4 | 3 | 50 | 65 | 0 |
5 | 12 | 90 | 90 | 1 |
6 | 6 | 65 | 72 | 0 |
- Passed: Dependent variable (binary: 1 = Passed, 0 = Failed).
- Study_Hours, Attendance, Previous_Grade: Independent variables.
Step 2: Access the Binary Logistic Regression Tool
- Go to Analyze > Regression > Binary Logistic.
- A dialog box will appear.
Step 3: Define Variables
- Move
Passed
to the Dependent box. - Move
Study_Hours
,Attendance
, andPrevious_Grade
to the Covariates box.
Step 4: Customize Options
- Click Categorical if any independent variable is categorical.
- Click Options and check:
- Hosmer-Lemeshow test (assesses model fit).
- Classification table (shows prediction accuracy).
- Confidence intervals for Exp(B) (provides odds ratios).
- Click Continue, then OK to run the analysis.
Interpreting the Output
1. Model Summary
- -2 Log-Likelihood: Lower values indicate better model fit.
- Cox & Snell R² and Nagelkerke R²: Measures of how much variation in the dependent variable is explained by the model.
- Example: Nagelkerke R² = 0.65 suggests the model explains 65% of the variation in
Passed
.
- Example: Nagelkerke R² = 0.65 suggests the model explains 65% of the variation in
2. Hosmer-Lemeshow Test
- Tests model goodness of fit.
- p > 0.05 indicates a good fit.
3. Classification Table
- Shows how well the model predicts
Passed
(e.g., 85% accuracy).
4. Variables in the Equation (Regression Coefficients)
- B (Logit Coefficients): Influence of each predictor on the outcome.
- Exp(B) (Odds Ratio): Indicates how a one-unit change in a predictor affects the odds of the event occurring.
Example output:
Predictor | B | Exp(B) | p-value |
---|---|---|---|
Study_Hours | 0.50 | 1.65 | 0.02 |
Attendance | 0.03 | 1.03 | 0.05 |
Previous_Grade | 0.07 | 1.07 | 0.01 |
Interpretation:
- Study_Hours (Exp(B) = 1.65, p = 0.02): Each additional study hour increases the odds of passing by 65%.
- Attendance (Exp(B) = 1.03, p = 0.05): A 1% increase in attendance increases the odds of passing by 3%.
- Previous_Grade (Exp(B) = 1.07, p = 0.01): A one-point increase in prior grades increases the odds of passing by 7%.
Practice Example: Perform Logistic Regression with Multiple Predictors
Use the following dataset:
ID | Age | Income | Debt-to-Income Ratio | Loan Approved (1=Yes, 0=No) |
---|---|---|---|---|
1 | 25 | 30000 | 0.35 | 0 |
2 | 40 | 50000 | 0.20 | 1 |
3 | 30 | 40000 | 0.25 | 1 |
4 | 22 | 25000 | 0.40 | 0 |
5 | 45 | 60000 | 0.15 | 1 |
- Perform a Binary Logistic Regression with
Loan Approved
as the dependent variable andAge
,Income
, andDebt-to-Income Ratio
as predictors. - Interpret the odds ratios (Exp(B)) to determine the impact of each factor.
- Evaluate the model fit using the Hosmer-Lemeshow test and classification table.
Common Mistakes to Avoid
- Ignoring Multicollinearity: Check for high correlations between predictors using Collinearity Diagnostics.
- Overfitting the Model: Avoid using too many predictors with small datasets.
- Interpreting Odds Ratios Incorrectly: Exp(B) represents multiplicative changes in odds, not probability changes.
Key Takeaways
- Binary Logistic Regression with Multiple Predictors improves prediction accuracy by including multiple independent variables.
- Odds Ratios (Exp(B)) provide insight into how predictors affect the likelihood of an event occurring.
- Always check model fit (Hosmer-Lemeshow test, Nagelkerke R²) before interpreting results.
What’s Next?
In Day 31 of your 50-day SPSS learning journey, we’ll explore Ordinal Logistic Regression in SPSS. You’ll learn how to model ordered categorical outcomes, such as customer satisfaction levels (low, medium, high). Stay tuned for another powerful statistical technique!