Research Essay
Credit Scores Are Broken. Here’s What Actually Predicts Default.
Your FICO score is a three-digit number that determines whether you can get a mortgage, what interest rate you pay on your car loan, and sometimes whether you get hired. The average score in the U.S. is 717. About 1.4% of people have a perfect 850. The system looks precise, objective, authoritative.
It’s also kind of garbage.
Not completely garbage—it’s better than what came before it. But it was built on assumptions about creditworthiness from the 1980s and hasn’t been fundamentally redesigned since. When I built an ML model to predict loan defaults using Lending Club data, the features most predictive of default weren’t the ones FICO weights most heavily. That gap is what this essay is about.
Quick History
Before FICO, loan officers made lending decisions based on vibes. And by vibes I mean they routinely denied loans based on race, neighborhood, gender, and whether they liked the look of you. The Fair Isaac Corporation introduced the FICO score in 1989 to replace that subjectivity with data. The score ranges from 300 to 850 and is calculated from five categories: payment history (35%), amounts owed (30%), length of credit history (15%), new credit (10%), and credit mix (10%).
It was a massive improvement. It expanded credit access to millions of people. But the core architecture hasn’t changed. It’s still fundamentally a linear model built on credit bureau data. It doesn’t incorporate income, employment, rent payments, utility payments, savings, or dozens of other things that obviously predict whether someone can repay a loan. It’s a system designed before the internet that we’re still using in 2026.
What My ML Model Actually Found
For my credit risk project, I trained three models on Lending Club data—logistic regression, random forest, and XGBoost—on over 2 million loan records from 2007 to 2018. Each record included the borrower’s FICO score, income, debt-to-income ratio, loan amount, interest rate, and a bunch of other features, plus whether they eventually defaulted.
XGBoost hit an AUC of 0.87, beating logistic regression (0.79) and edging out random forest (0.84). But the interesting part wasn’t the model performance—it was the SHAP analysis. When I decomposed the predictions, the most important features for predicting default were: interest rate, debt-to-income ratio, Lending Club’s internal loan grade, revolving credit utilization, and loan-to-income ratio.
FICO score showed up. But it wasn’t the top predictor. Debt-to-income ratio—a feature FICO explicitly doesn’t use—was consistently one of the strongest signals.
Think about why that makes sense. Someone earning $40k with $35k in debt is in a completely different situation than someone earning $200k with the same debt, even if they have identical FICO scores. FICO can’t tell the difference because it has no concept of income. A Procedia Computer Science study using random forests on the same Lending Club data found the same thing: “credit grade, debt-to-income ratio, FICO score, and revolving line utilization play an important role in loan defaults,” with DTI being a critical signal FICO misses entirely.
FICO tells you how someone managed credit in the past. It says nothing about whether they can afford the next payment.
The Bias Problem Is Worse Than You Think
Credit scoring was supposed to reduce discrimination. And compared to individual loan officers with unexamined prejudices, it did. But the data that feeds the model carries its own history, and that history is anything but neutral.
The numbers are rough. In 2021, the median credit score for Black consumers was 639—nearly 100 points below the median for white consumers at 730, according to U.S. News and Urban Institute data. About 15% of Black and Hispanic Americans are “credit invisible” (no credit file at all), compared to 9% of white and Asian Americans. The Urban Institute found that subprime score rates in majority-Black, Hispanic, and Native American communities are at least 1.5x higher than in majority-white ones.
These aren’t random disparities. They’re the downstream effects of redlining, discriminatory housing policy, wage gaps, and unequal access to banking. FICO doesn’t use race as an input. But it uses features that are deeply correlated with race through historical inequality. Length of credit history disadvantages communities excluded from formal banking until recently. Credit mix rewards having mortgages, which requires the kind of wealth accumulation that discriminatory policies systematically prevented for minority communities. The CFPB has documented that consumers in majority-Black and Hispanic neighborhoods have far higher rates of credit report disputes, suggesting more errors in the underlying data. The National Consumer Law Center’s 2024 “Past Imperfect” report laid this out explicitly: credit scores “bake in and perpetuate past discrimination,” creating a feedback loop where historical injustice generates lower scores, which generate higher borrowing costs, which generate more financial strain, which generate lower scores.
It’s a cycle that feeds itself.
The K-Shaped Credit Recovery
Here’s something I find really striking. Between 2021 and 2025, the middle score range (600-749) shrank from 38.1% to 33.8% of the population. More people moved into both the highest and lowest brackets simultaneously. FICO’s own research calls this a “K-shaped recovery”—some people came out of the pandemic in better shape, others deteriorated. Bankcard delinquencies rose 48%. Auto loan delinquencies up 24%. Mortgage delinquencies up 58%.
The average national score held steady at 717. But that average is hiding a bimodal distribution where the middle is hollowing out. A 680 score in 2019 meant something different than a 680 in 2025, because today that person is more likely on a downward trajectory. A single number becomes less informative when the population is splitting in two.
So Can We Fix It?
Better models exist. That’s not the hard part. ML models demonstrably outperform FICO at predicting default—the literature is clear on this, and my own project confirmed it. The hard part is everything else.
There’s a real tension between predictive power and interpretability. FICO’s formula, while flawed, is transparent enough that you can roughly understand how to improve your score: pay on time, keep utilization low, maintain long credit histories. An XGBoost model with 50 features and complex interaction effects might predict default more accurately, but it’d be nearly impossible for a consumer to understand why they were denied credit or what to do about it. The Equal Credit Opportunity Act requires lenders to give specific reasons for denials. That matters.
And then there’s the data question. Using utility payments, rent history, and bank transaction data could improve predictions and expand access for the 45 million Americans who are credit-invisible or have thin files. But it also creates new privacy concerns and new vectors for bias. If a model learns that people who shop at certain stores are higher risk, it might just be picking up socioeconomic proxies shaped by the same historical discrimination that’s already baked into FICO.
I don’t think the answer is keeping FICO because it’s familiar. It misses too much—no income, no savings, no DTI, and it carries forward biases from generations of unequal access. But replacing it with a black-box ML model isn’t obviously better if that model is more accurate but less accountable. Building the credit risk model taught me that the technical problem of predicting default is basically solved. The harder problem—designing a system that’s accurate, fair, and transparent at the same time—is the one we haven’t cracked yet.