ML · FinTechLive Demo

Financial Risk Pipeline

XGBoost Loan Default Predictor — UCI German Credit · CalibratedClassifierCV · SMOTE

This model was built during my internship at Inditrade Capital, where I developed a risk-scoring system to flag high-risk loan accounts for the collections team — achieving 78% precision on the minority default class.

kbaylake/financial-risk-pipeline

How it works

01 / Input

Enter a borrower's financial profile — loan size, income, credit score, employment history, debt-to-income ratio, and loan purpose.

02 / Model

An XGBoost classifier trained on the real UCI German Credit dataset, with SMOTE for class balancing and CalibratedClassifierCV for reliable probabilities. Validated with stratified 5-fold cross-validation targeting AUC-ROC ≥ 0.72.

03 / Output

The API returns a default probability, a confidence score (how far from the 50% decision boundary), and a risk label — green for low, amber for medium, red for high.

Borrower Profile

Loan Amount$25,000

Annual Income$65,000

Credit Score680

Employment (years)4 yrs

Debt-to-Income Ratio32%

Loan Purpose

Adjust the borrower profile and click

Run Risk Assessment

The model will call the live FastAPI endpoint and return
a risk label, probability, and confidence score.

Architecture

Browser sliders  →  POST /predict  →  FastAPI (Railway)
                                           │
                              StandardScaler.transform()
                                           │
                      XGBClassifier (CalibratedClassifierCV)
                              .predict_proba()
                                           │
                     risk_label + default_probability + confidence
                                           │
                       ◀────── JSON response ──────────

Model trained on UCI German Credit data and baked into the Docker image at build time via python train.py — zero cold-start latency on Railway.

Project vs. Real Implementation

This is a working end-to-end ML pipeline — real data, a trained and calibrated model, and a live API. But building a portfolio project and deploying a credit scoring system in production are genuinely different problems. Here's where I know this falls short, and what I'd change if this were real.

Limitations

Small dataset — 1,000 records. Real lenders train on millions of applications — enough to capture rare edge cases and long-tail borrower profiles.
Proxy features — The 6 sliders don't map directly to German Credit columns — income becomes a savings proxy, credit score splits into two categorical fields. Signal gets lost in translation.
Single data source — No bureau data, no transaction history, no behavioural signals. A real credit model pulls from multiple enriched sources at inference time.
Static model — Once trained, the model doesn't update. Borrower behaviour shifts over time — especially around economic events — and a static model drifts silently.

What I'd add

SHAP explainability — So the output shows not just the score, but which inputs drove it — critical for applicant-facing decisions and internal audits.
Data drift monitoring — Track feature distributions in production against training baselines. Trigger retraining when the gap crosses a threshold.
Feature store — A shared registry so training features and inference features come from the same source and can't silently diverge.
Retraining pipeline — Automated retraining on a schedule or triggered by drift alerts, with rollback if the new model regresses on a held-out validation set.

Production gap

Regulatory compliance — Credit decisions fall under fair-lending law. Every output needs to be auditable, and declined applicants are entitled to an explanation.
Fairness testing — Models trained on historical data can encode past discrimination. A real deployment requires bias evaluation across demographic groups before going live.
Human-in-the-loop — Borderline predictions — those sitting near the 35% or 60% thresholds — should route to a human reviewer rather than resolve automatically.
Prediction logging — Every inference needs to be stored with its inputs, outputs, and timestamp for model monitoring, compliance, and debugging production issues.