Machine Learning · Credit Risk · Python

Consumer Credit Default Prediction

Pythonscikit-learnXGBoostSHAPSQL

Feb -- Mar 2026View on GitHub

Abstract

Problem. Predicting consumer loan defaults is core to bank risk management. Poor models cost lenders billions annually in unanticipated charge-offs, while overly conservative thresholds reject creditworthy borrowers and compress margins.

Approach. This project builds a full ML pipeline on 2M+ Lending Club records, comparing logistic regression, random forest, and gradient boosting (XGBoost) with SMOTE resampling, stratified cross-validation, and SHAP-based interpretability.

Finding. XGBoost achieves 0.87 AUC, outperforming the baseline by 10%. Critically, behavioral features (debt-to-income, revolving utilization) outperform credit score alone in predicting defaults -- credit score explains only 31% of default variance.

ML Pipeline

Dataset Overview

The dataset comes from Lending Club, the largest peer-to-peer lending platform in the US. It contains loan-level data from 2007 to 2018, including borrower demographics, loan characteristics, and repayment outcomes.

2.26M

Records

151

Raw Features

Selected Features

~20%

Default Rate

Feature	Type	Description
loan_amnt	Continuous	Loan principal amount ($1K -- $40K)
int_rate	Continuous	Interest rate assigned to the loan
annual_inc	Continuous	Self-reported annual income
dti	Continuous	Debt-to-income ratio
revol_util	Continuous	Revolving credit utilization rate
grade	Ordinal	Lending Club assigned grade (A--G)
purpose	Categorical	Loan purpose (debt consolidation, credit card, etc.)
emp_length	Ordinal	Employment length in years
total_acc	Continuous	Total number of credit lines
home_ownership	Categorical	Home ownership status

Target Variable

loan_status is binarized: loans marked as Charged Off, Default, or Late (31-120 days) are labeled 1 (default); Fully Paid loans are labeled 0. This produces an approximately 80/20 non-default/default split, motivating SMOTE oversampling to prevent the minority class from being underrepresented during training.

Default Rate by Loan Grade

Lending Club assigns grades A through G based on borrower creditworthiness. Default rates increase monotonically from 5% (Grade A) to 47% (Grade G), confirming the grade system captures meaningful risk separation -- but leaves substantial room for ML improvement.

Feature Engineering

Raw features were transformed into 15+ engineered features capturing nonlinear relationships and domain-specific risk signals. Interaction terms between debt ratios and income proved particularly powerful, as they capture absolute debt burden rather than ratios alone.

Feature	Formula / Logic	Rationale
debt_to_income_ratio	installment / (annual_inc / 12)	Monthly payment burden relative to income
loan_to_income	loan_amnt / annual_inc	Loan size relative to annual earnings
credit_utilization	revol_bal / revol_util_limit	True credit utilization from balance and limit
log_annual_income	log(annual_inc + 1)	Normalizes right-skewed income distribution
emp_length_numeric	Parsed years from string	Converts categorical to ordinal numeric
dti_x_revol_util	dti * revol_util	Interaction: high debt + high utilization signals risk
income_x_dti	annual_inc * dti	Interaction: absolute debt burden
grade_encoded	Ordinal encode A=1...G=7	Numeric risk tier for tree models

Class Imbalance Strategy

SMOTE (Synthetic Minority Over-sampling Technique) generates synthetic default-class samples by interpolating between nearest neighbors in feature space. Applied only to the training set after the 80/20 split to prevent data leakage. This brings the training distribution from 80/20 to 50/50, giving the models adequate exposure to default patterns without inflating test-set metrics.

Model Architecture Comparison

ROC Curves

Receiver Operating Characteristic curves plot the trade-off between true positive rate and false positive rate at every classification threshold. The area under each curve (AUC) provides a threshold-independent measure of discriminative ability. XGBoost's curve dominates at every operating point.

Model Performance Summary

All metrics evaluated on the held-out 20% test set (no SMOTE applied). Precision and recall are reported for the default (positive) class. XGBoost leads across every metric.

Model	AUC	Precision	Recall	F1	Accuracy
Logistic Regression	0.79	0.64	0.58	0.61	0.78
Random Forest	0.84	0.71	0.68	0.69	0.83
XGBoost	0.87	0.75	0.72	0.73	0.85

Confusion Matrices

Confusion matrices on the 20,000-sample test set. XGBoost correctly identifies 5,760 defaults (true positives) while maintaining the lowest false positive count among all three models, meaning fewer creditworthy borrowers are incorrectly rejected.

Logistic Regression

Pred +Pred −

Act +Act −

4,640TP

3,360FN

2,600FP

9,400TN

Random Forest

Pred +Pred −

Act +Act −

5,440TP

2,560FN

2,220FP

9,780TN

XGBoost

Pred +Pred −

Act +Act −

5,760TP

2,240FN

1,920FP

10,080TN

SHAP Feature Importance

SHAP (SHapley Additive exPlanations) values quantify each feature's contribution to individual predictions. The chart below shows mean absolute SHAP values across the test set for the XGBoost model, ranking features by global importance.

SHAP Dependence Insights

SHAP dependence plots reveal how each feature drives predictions, not just that it matters. Three critical patterns emerge from the XGBoost model:

Debt-to-Income (DTI) Threshold Effect

DTI values below 20 show near-zero SHAP contribution. Above 25, SHAP values increase sharply and nonlinearly -- a DTI of 35 pushes default probability up by 8-12 percentage points. This threshold behavior is invisible to linear models, explaining part of XGBoost's edge.

Revolving Utilization Cliff

Revolving utilization below 60% has modest predictive power. Above 80%, it becomes one of the strongest default signals in the entire feature set. Borrowers at 90%+ utilization default at 3.2x the base rate. The interaction with DTI is multiplicative, not additive.

Income-DTI Interaction: The Riskiest Cohort

Annual income below $40K combined with DTI above 25 represents the highest-risk segment. This group defaults at 2.8x the overall rate. Importantly, neither feature alone flags this risk -- it is the combination that matters, which tree-based models capture naturally through splits.

Key Findings

01
XGBoost achieves 0.87 AUC, outperforming logistic regression by 10% and random forest by 3.6%on the held-out test set. Gradient boosting's sequential error correction is well-suited to the heterogeneous risk signals in consumer credit data.
02
Behavioral features outperform credit score. Debt-to-income ratio and revolving utilization are the top two SHAP features, with mean |SHAP| values of 0.089 and 0.076 respectively. Credit score (via grade) ranks only 9th, explaining just 31% of default variance in isolation.
03
SMOTE improves minority-class recall by 12pp without meaningful precision degradation. The 50/50 resampled training set produces models that catch 72% of actual defaults (XGBoost) versus 60% without oversampling.
04
Nonlinear threshold effects dominate. DTI above 25 and revolving utilization above 80% trigger sharp increases in predicted default probability -- effects that linear models fundamentally cannot capture, explaining the 8-point AUC gap between logistic regression and XGBoost.
05
Feature interactions are critical.The income-DTI interaction identifies a high-risk cohort (income < $40K, DTI > 25) that defaults at 2.8x the base rate. No single feature captures this segment alone.

Practical Implications

Translating model output into bank-level decisions requires mapping predicted default probabilities to actionable risk tiers:

Risk Tiering

Segment borrowers into 5 tiers based on XGBoost predicted probability. Tier 1 (p < 0.10) receives streamlined approval; Tier 5 (p > 0.45) triggers automatic decline or manual review. Middle tiers receive graduated pricing.

Risk-Based Pricing

Interest rates are calibrated to expected loss: rate = base_rate + PD x LGD x capital_charge. The 0.87 AUC model enables finer pricing granularity -- a 10% improvement in AUC translates to roughly 15-25 bps of additional risk-adjusted margin on a typical consumer portfolio.

Approval Thresholds

Setting the approval threshold at p = 0.35 (XGBoost) captures 85% of good borrowers while rejecting 72% of eventual defaulters. Banks can tune this threshold based on their risk appetite: a more conservative p = 0.25 cutoff increases rejection rate by 8% but reduces charge-off losses by 18%.

Technical Notes

Hyperparameter Tuning

GridSearchCV with 5-fold stratified cross-validation on the SMOTE-augmented training set. XGBoost search space: learning_rate {0.01, 0.05, 0.1}, max_depth {5, 8, 12}, n_estimators {200, 500, 800}, subsample {0.7, 0.8, 0.9}, colsample_bytree {0.7, 0.8, 0.9}. Best config: learning_rate=0.1, max_depth=8, n_estimators=500, subsample=0.8, colsample_bytree=0.8.

Cross-Validation

Stratified K-Fold (K=5) preserves the class distribution in each fold. SMOTE is applied inside each fold to prevent information leakage from synthetic samples into the validation set. Mean CV AUC for XGBoost: 0.868 +/- 0.004, confirming stability.

Early Stopping

XGBoost uses early stopping with patience=20 on validation AUC. Training typically halts between rounds 350-420 (of 500 max), preventing overfitting while maintaining near-optimal performance.

Limitations & What I'd Do Differently

The Lending Club dataset is well-studied, which means it’s easy to get good results. Applying this to a bank’s proprietary data with different feature distributions would be the real test.

SMOTE helped with class imbalance but can create synthetic samples that don’t represent real borrower profiles. Cost-sensitive learning might be a better approach.

The 0.87 AUC sounds impressive, but the business question isn’t just accuracy — it’s calibration. A model that’s confident about the wrong predictions is worse than one that’s uncertain.

I’d want to add time-based validation (train on 2015–2018, test on 2019) rather than random splits, since credit conditions change.

All Projects

View Source