Inspiration Major market shocks — COVID (2020), rate hikes (2022), SVB (2023), tariffs (2025) — all shared the same pattern: news attention surged before markets moved. Traditional risk models (VaR, historical volatility) are backward-looking. We wanted a forward-looking system that fuses what the world is talking about with what the options market is pricing in. That became SVF.
What it does SVF is an ML-powered early-warning system that forecasts market disruption over 7, 14, and 21 days by predicting calibrated event risk, event type, and a social-volatility score—combining news attention and implied volatility—to generate GREEN / YELLOW / RED alerts with event-specific hedging guidance.
How we built it SVF was built using data from 2017–2025 (2,261 trading days), combining S&P 500 prices, realized volatility (1-day and 3-day), 30-day implied volatility, GDELT news article proportions (event_intensity), sustained attention (trend_spike), and a catalog of 3,843 historical events across geopolitics, macro, and crisis categories. Event severity is derived automatically, without manual labels—by ranking peak implied volatility and peak news intensity during each event and mapping their average percentile to a 1–5 scale, with severity ≥ 4 defining positive risk events. The model uses 66 leakage-free features capturing market and volatility dynamics, news surges, event history, and calendar effects, plus a 67th feature for forecast horizon (7/14/21) to enable a unified model. SVF is implemented as a single LightGBM model with a monotonic constraint on horizon, ensuring risk is non-decreasing over time, combined with a three-seed ensemble blended 60/40, isotonic calibration for true probabilities, and post-hoc clipping to preserve monotonicity after calibration. Performance is evaluated via walk-forward expanding-window validation from 2020–2025, with no shuffling or leakage, and the production model uses internal early stopping and a held-out calibrator.
What we’re proud of
- One unified, monotone model instead of three disconnected ones
- Zero manual labels — severity derived entirely from data
- Calibrated probabilities suitable for decision-making
- End-to-end system: data → model → alerts → live Streamlit dashboard
- Clean, reproducible, production-ready code
Log in or sign up for Devpost to join the conversation.