SPY Forward Volatility Forecaster

Inspiration

Wall Street runs on unstructured data, but raw sentiment alone isn't enough to predict the market. A positive headline on a quiet Tuesday doesn't mean the same thing as a positive headline during a market crash. We wanted to build a model that doesn't just read the news, but contextualizes it against the current market regime. The goal was to fuse natural language processing (NLP) with the rigorous stochastic time-series methods used by quantitative hedge funds, proving that sentiment is only valuable when interacted with market volatility.

What it does

The SPY Forward Volatility Forecaster is a quantitative machine learning pipeline that predicts the realized volatility (absolute return magnitude) of the SPY ETF for the following day. It ingest daily financial headlines and 100 days of lagged returns, extracting the underlying market structure (clustering, leverage asymmetry, Parkinson ranges) and mapping it against financial-domain sentiment analysis.

Rather than outputting a generic "Buy/Sell" classification, it outputs a continuous prediction of expected market turbulence, allowing traders to size positions, price options, or hedge systemic risk effectively.

How we built it

Target Identification: We first conducted a deep statistical analysis of the target variable, determining it had a 0.709 correlation with the absolute magnitude of the next day's return ($|Return_Lag_1_{t+1}|$).
Domain-Aware NLP: Instead of generic embeddings, we built financial keyword extractors (e.g., CRISIS, FED, EARNINGS) and interacted them directly with rolling volatility metrics. For example, the model mathematically calculates crisis_keyword_count × 5d_realized_vol to capture conditional panic.
Stochastic Feature Engine: We implemented 90+ quantitative finance features, including Parkinson High/Low estimators, EWMA decay, momentum, and asymmetric leverage effects (where negative returns cluster more volatility than positive returns).
Conservative Ensemble Stacking: Because financial data is incredibly noisy, we built a highly diverse, regularized 6-model ensemble. We blended shallow LightGBM trees, conservative XGBoost structures, and robust linear models (Ridge, ElasticNet, Huber) using a non-negative Ridge Meta-Learner to prevent reckless extrapolation.

Challenges we ran into

(The 0.15796 Public Leaderboard Shift) Our biggest challenge was a massive domain shift between the training data (2009–2012, post-GFC high volatility) and the public test evaluation period (2013–2015, extreme low volatility).

Our initial v3 architecture used a complex "Rank-Preserving Distribution Calibration" to mathematically force our test predictions to spread out, assuming our models were suffering from variance collapse. This scored poorly on the public leaderboard. We realized the models weren't broken; they were correctly identifying that the test period had virtually zero volatility. By forcing variance back into the predictions, we generated artificial errors.

We had to completely rebuild our architecture (v4) to explicitly trust the low-variance shrinkage of a highly-regularized ensemble, applying a log1p(x) target transform to suppress the extreme 2009 crash data from overpowering the recent quiet regime.

Accomplishments that we're proud of

Discovering the exact mathematical nature of the anonymized target variable through programmatic Pearson correlation analysis.
Successfully diagnosing and surviving a severe temporal domain shift where the test set variance dropped by over 400% compared to the training set.
Avoiding the classic "overfit-to-noise" trap by utilizing strict L2 regularization and shallow tree limitations instead of throwing massive deep learning models at small tabular data.

What we learned

Sentiment needs context: A negative VADER score is meaningless unless interacted with the current market volatility regime.
Ensemble diversity > complexity: Blending simple Ridge regression with shallow Gradient Boosting trees proved far more stable than complex hyperparameter optimization.
Calibration is dangerous: Forcing predictions to match a historical distribution is catastrophic when the underlying market regime has fundamentally shifted.

What's next for SPY Forward Volatility Forecaster

Live Ingestion: Connecting the pipeline to the Alpaca News API and Polygon.io for real-time daily inference at 3:55 PM EST.
Option Pricing Integration: Using our forecasted volatility as the $\sigma$ input in a Black-Scholes pricing engine to find mispriced SPY straddles.
FinBERT GPU Optimization: Fully optimizing our HuggingFace pipelines with Flash Attention to process intraday minute-by-minute tick sentiment without dropping frames.

Built With

huggingface-transformers
jupyter
lightgbm
nltk
numpy
pandas
python
scikit-learn
scipy
xgboost

Updates

Erebus ‎ started this project — Feb 28, 2026 11:56 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.