Financial Risk Pipeline
XGBoost Loan Default Predictor — UCI German Credit · CalibratedClassifierCV · SMOTE
This model was built during my internship at Inditrade Capital, where I developed a risk-scoring system to flag high-risk loan accounts for the collections team — achieving 78% precision on the minority default class.
kbaylake/financial-risk-pipelineHow it works
01 / Input
Enter a borrower's financial profile — loan size, income, credit score, employment history, debt-to-income ratio, and loan purpose.
02 / Model
An XGBoost classifier trained on the real UCI German Credit dataset, with SMOTE for class balancing and CalibratedClassifierCV for reliable probabilities. Validated with stratified 5-fold cross-validation targeting AUC-ROC ≥ 0.72.
03 / Output
The API returns a default probability, a confidence score (how far from the 50% decision boundary), and a risk label — green for low, amber for medium, red for high.
Borrower Profile
Adjust the borrower profile and click
Run Risk Assessment
The model will call the live FastAPI endpoint and return
a risk label, probability, and confidence score.
Architecture
Browser sliders → POST /predict → FastAPI (Railway)
│
StandardScaler.transform()
│
XGBClassifier (CalibratedClassifierCV)
.predict_proba()
│
risk_label + default_probability + confidence
│
◀────── JSON response ──────────Model trained on UCI German Credit data and baked into the Docker image at build time via python train.py — zero cold-start latency on Railway.
Project vs. Real Implementation
This is a working end-to-end ML pipeline — real data, a trained and calibrated model, and a live API. But building a portfolio project and deploying a credit scoring system in production are genuinely different problems. Here's where I know this falls short, and what I'd change if this were real.
Limitations
- Small dataset — 1,000 records. Real lenders train on millions of applications — enough to capture rare edge cases and long-tail borrower profiles.
- Proxy features — The 6 sliders don't map directly to German Credit columns — income becomes a savings proxy, credit score splits into two categorical fields. Signal gets lost in translation.
- Single data source — No bureau data, no transaction history, no behavioural signals. A real credit model pulls from multiple enriched sources at inference time.
- Static model — Once trained, the model doesn't update. Borrower behaviour shifts over time — especially around economic events — and a static model drifts silently.
What I'd add
- SHAP explainability — So the output shows not just the score, but which inputs drove it — critical for applicant-facing decisions and internal audits.
- Data drift monitoring — Track feature distributions in production against training baselines. Trigger retraining when the gap crosses a threshold.
- Feature store — A shared registry so training features and inference features come from the same source and can't silently diverge.
- Retraining pipeline — Automated retraining on a schedule or triggered by drift alerts, with rollback if the new model regresses on a held-out validation set.
Production gap
- Regulatory compliance — Credit decisions fall under fair-lending law. Every output needs to be auditable, and declined applicants are entitled to an explanation.
- Fairness testing — Models trained on historical data can encode past discrimination. A real deployment requires bias evaluation across demographic groups before going live.
- Human-in-the-loop — Borderline predictions — those sitting near the 35% or 60% thresholds — should route to a human reviewer rather than resolve automatically.
- Prediction logging — Every inference needs to be stored with its inputs, outputs, and timestamp for model monitoring, compliance, and debugging production issues.