Lasso — Volatility as Information, Not Risk

Inspiration

Traditional credit scoring fails gig workers. Their income volatility, a fundamental feature of platform work, gets read as risk rather than information. Banks can't see the difference between a DoorDash driver with predictable weekend surges and one facing platform deactivation.

Gig workers represent 36% of the U.S. workforce, yet they're systematically underserved by traditional lending. The core insight: volatility isn't noise; it's signal. A delivery driver's income variance reveals their platform diversification, metro market dynamics, and resilience to shocks. We built Lasso to close this information gap and make the invisible structure of gig work legible to lenders.


What It Does

Lasso is a Monte Carlo risk assessment platform that simulates 5,000 independent income paths over 24 months for gig workers. Instead of relying on credit scores, it analyzes income patterns, life event probabilities, and macro shocks to produce comprehensive risk profiles for banks.

Lasso is an information expansion tool, not a decision engine. We provide neutral risk data, no "Approve/Decline" language, only risk tiers (LOW, MODERATE, HIGH_RISK). Banks make the final lending decision.

Key outputs:

  • Risk metrics: Probability of default, expected loss, CVaR-95%, risk tier assignment
  • Loan structure guidance: Optimal loan amount and term based on income stability modeling
  • Granular distributions: Income percentiles (P10/P50/P90) by month, default timing histograms, survival curves
  • 5 visualizations per assessment: Income paths, life event timeline, default timing analysis, income parameter evolution, and a 3D risk surface across loan amount/term combinations
  • Natural language risk summaries: 300–500 word LLM-generated profiles plus a 2–3 sentence quick summary

For example, a DoorDash + Uber driver requesting a $5,000 loan over 6 months might get:

  • P(default) = 3.48%
  • Risk tier: LOW
  • Key insight: 96.5% of simulated paths survive to month 24, with median income of $1,143/month

How We Built It

Tech Stack:

  • Backend: Python, FastAPI, NumPy/SciPy (Monte Carlo engine), Matplotlib (visualization), Claude API (LLM integration)
  • Frontend: Next.js, React, TypeScript, Tailwind CSS — conversational UI for applicant data collection and real-time scenario stress testing
  • Data: FRED API for macroeconomic data, pre-cached JPMorgan Institute and Gridwise Research datasets for platform-specific income/expense calibration

Three-Layer Monte Carlo Architecture:

Layer 1 — Core Monte Carlo Engine (monte_carlo_sim/) Vectorized jump-diffusion income model. Simulates 5,000 parallel paths simultaneously using NumPy, with time-varying parameters (μ, σ, λ) for income volatility. Default detection uses rolling cash flow windows and buffer thresholds.

Layer 2 — Life Simulation Engine (life_simulation/) Probabilistic life event sampling using Poisson/Bernoulli distributions. Each path independently samples its own sequence of events: vehicle repairs, health issues, platform deactivations, housing cost spikes, and macro shocks (recessions, gas price surges, regulatory changes). True path independence — no batch determinism.

Layer 3 — AI Scenario Generator (ai_model/) Natural language scenario interface powered by Claude. Accepts inputs like "What if gas prices spike 40% in month 8?", interprets the scenario, and applies parameter shifts to the Monte Carlo engine. Generates human-readable risk summaries from the simulation output.

Data Pipeline: Parameters are calibrated from real research — JPMorgan Chase Institute (36% income CV for gig workers), Gridwise (platform-specific hourly rates), and FRED (macro shock magnitudes). Not guesses; grounded in actual gig economy data.


Challenges We Ran Into

Path Independence Crisis Our initial life simulation generated one "story" applied to all 5,000 paths. If that story included an early car repair combined with a gas spike, 100% of paths defaulted — completely unrealistic. We had to refactor to sample events independently per path using vectorized NumPy operations. A significant architectural change that touched multiple modules.

Per-Path Expenses The default detection logic originally assumed scalar expenses (identical for all paths). Extending to per-path expense matrices required modifying the defaults engine, the Monte Carlo loop, and the serialization layer. A small conceptual change with large ripple effects.

Statistical Edge Cases When we reduced loan amounts from $5,000 to $1,000, the P90 time-to-default dropped from month 10 to month 5 — counterintuitive. After hours of debugging, we realized: with tiny loans, only catastrophic early failures default, creating a selection effect. Time-to-default percentiles are conditional on default, not universal. Subtle but critical.

Performance Bottleneck Naive Python loops over 5,000 paths × 24 months × multiple event types took 10+ minutes per simulation. NumPy broadcasting, cumulative sums, and vectorized sampling brought it under 1 second. Vectorization was non-negotiable.


Accomplishments We're Proud Of

True Path Independence — Each of the 5,000 paths independently samples its own life events and macro shocks, producing realistic distributions: some paths default early, most never default, creating a believable survival curve.

Research-Backed Calibration — Our parameters come from real gig economy data (JPMorgan Chase Institute, Gridwise, FRED), not guesses. This grounds the simulation in empirical reality.

Sub-Second Performance — 5,000 paths × 24 months × probabilistic event sampling × default detection, completed in under 1 second via NumPy vectorization.

No Credit Scores Required — Lasso assesses risk entirely from gig work patterns, income volatility, emergency buffers, and platform diversification. Designed specifically for workers excluded from traditional credit systems.


What We Learned

Calibration beats complexity. Getting baseline income/expense parameters right mattered more than sophisticated event models. A 10% miscalibration in expenses cascaded into 100% default rates.

Distributions are tricky. Time-to-default percentiles, survival curves, conditional probabilities — subtle statistical concepts that require careful interpretation. Always ask: "Is this metric conditional? On what?"

Vectorization is an art. NumPy broadcasting can feel like magic, but getting shapes right — especially with per-path event matrices — required careful thinking about axis operations, cumulative sums, and boolean masking.

Modular architecture pays off. Separating the Data Pipeline, Monte Carlo engine, and Life Simulation made the codebase maintainable, but required discipline around interfaces. The upfront pain was worth it.


What's Next for Lasso

Real-Time Data Integration — Pull live earnings data via Plaid API and integrate with bank transaction streams for real-time income tracking, replacing archetypes with actual gig worker data.

Expanded Event Coverage — Childcare emergencies, multi-job scenarios (gig + part-time W-2), platform algorithm changes (surge pricing shifts), and weather-dependent income shocks.

Beyond Lending — Insurance pricing for gig workers (volatility as premium adjustment), income smoothing products (advances against future earnings), and financial planning tools driven by income variance modeling.


Built With

python · fastapi · numpy · scipy · matplotlib · next.js · react · typescript · tailwind-css · claude-api · fred-api

Share this project:

Updates