ML · FinTechLive Demo

Financial Risk Pipeline

XGBoost Loan Default Predictor — UCI German Credit · CalibratedClassifierCV · SMOTE

This model was built during my internship at Inditrade Capital, where I developed a risk-scoring system to flag high-risk loan accounts for the collections team — achieving 78% precision on the minority default class.

kbaylake/financial-risk-pipeline

How it works

01 / Input

Enter a borrower's financial profile — loan size, income, credit score, employment history, debt-to-income ratio, and loan purpose.

02 / Model

An XGBoost classifier trained on the real UCI German Credit dataset, with SMOTE for class balancing and CalibratedClassifierCV for reliable probabilities. Validated with stratified 5-fold cross-validation targeting AUC-ROC ≥ 0.72.

03 / Output

The API returns a default probability, a confidence score (how far from the 50% decision boundary), and a risk label — green for low, amber for medium, red for high.

Borrower Profile

$25,000
$65,000
680
4 yrs
32%

Adjust the borrower profile and click

Run Risk Assessment

The model will call the live FastAPI endpoint and return
a risk label, probability, and confidence score.

Architecture

Browser sliders  →  POST /predict  →  FastAPI (Railway)
                                           │
                              StandardScaler.transform()
                                           │
                      XGBClassifier (CalibratedClassifierCV)
                              .predict_proba()
                                           │
                     risk_label + default_probability + confidence
                                           │
                       ◀────── JSON response ──────────

Model trained on UCI German Credit data and baked into the Docker image at build time via python train.py — zero cold-start latency on Railway.

Project vs. Real Implementation

This is a working end-to-end ML pipeline — real data, a trained and calibrated model, and a live API. But building a portfolio project and deploying a credit scoring system in production are genuinely different problems. Here's where I know this falls short, and what I'd change if this were real.

Limitations

  • Small dataset1,000 records. Real lenders train on millions of applications — enough to capture rare edge cases and long-tail borrower profiles.
  • Proxy featuresThe 6 sliders don't map directly to German Credit columns — income becomes a savings proxy, credit score splits into two categorical fields. Signal gets lost in translation.
  • Single data sourceNo bureau data, no transaction history, no behavioural signals. A real credit model pulls from multiple enriched sources at inference time.
  • Static modelOnce trained, the model doesn't update. Borrower behaviour shifts over time — especially around economic events — and a static model drifts silently.

What I'd add

  • SHAP explainabilitySo the output shows not just the score, but which inputs drove it — critical for applicant-facing decisions and internal audits.
  • Data drift monitoringTrack feature distributions in production against training baselines. Trigger retraining when the gap crosses a threshold.
  • Feature storeA shared registry so training features and inference features come from the same source and can't silently diverge.
  • Retraining pipelineAutomated retraining on a schedule or triggered by drift alerts, with rollback if the new model regresses on a held-out validation set.

Production gap

  • Regulatory complianceCredit decisions fall under fair-lending law. Every output needs to be auditable, and declined applicants are entitled to an explanation.
  • Fairness testingModels trained on historical data can encode past discrimination. A real deployment requires bias evaluation across demographic groups before going live.
  • Human-in-the-loopBorderline predictions — those sitting near the 35% or 60% thresholds — should route to a human reviewer rather than resolve automatically.
  • Prediction loggingEvery inference needs to be stored with its inputs, outputs, and timestamp for model monitoring, compliance, and debugging production issues.