VolatilityForge: Regime-Aware SPY Volatility Prediction

Inspiration

Financial markets are not purely random — they exhibit structure, memory, and regime shifts.
Volatility clusters. Trends persist. Crashes cascade.

Most beginner models treat returns as independent observations.
But real markets behave more like fractal systems with memory.

VolatilityForge was inspired by the Fractal Market Hypothesis and stochastic volatility theory.
Instead of predicting just direction, I focused on predicting the magnitude of next-day movement in SPY.

Formally, the objective was to estimate:

$$ \sigma_{t+1} = f(r_t, r_{t-1}, r_{t-2}, \dots, \text{News}_t) $$

where:

  • $r_t$ = return at time $t$
  • $\sigma_{t+1}$ = next-day volatility proxy
  • $\text{News}_t$ = financial headline signal

The goal was not to chase leaderboard noise — but to build a model that understands regime structure.


What VolatilityForge Does

VolatilityForge predicts next-day SPY volatility using:

  1. Multi-horizon lag features
  2. Realized variance estimators
  3. Fractal persistence via Hurst exponent
  4. Headline-based uncertainty detection
  5. Gradient boosting ensemble modeling

Instead of assuming markets are memoryless, we estimate long-memory behavior using the rescaled range statistic.

The Hurst exponent is computed as:

$$ H = \frac{d \log(R/S)}{d \log(n)} $$

where:

  • $R/S$ = rescaled range
  • $n$ = time window

From this, fractal dimension is derived:

$$ D = 2 - H $$

This allows detection of:

  • Persistent/trending regimes ($H > 0.5$)
  • Mean-reverting regimes ($H < 0.5$)
  • Random walk behavior ($H \approx 0.5$)

How I Built It

1. Feature Engineering

I constructed structured volatility signals:

Short-term realized volatility:

$$ \text{Vol}{5d} = \sqrt{\frac{1}{5} \sum{i=1}^{5} r_{t-i}^2} $$

Medium-term realized volatility:

$$ \text{Vol}{20d} = \sqrt{\frac{1}{20} \sum{i=1}^{20} r_{t-i}^2} $$

Absolute movement intensity:

$$ \sum_{i=1}^{w} |r_{t-i}| $$

Fractal persistence:

$$ H_{50d} $$

Headline uncertainty detection using financial stress keywords:

uncertain | volatility | risk | inflation | war | crash | fed

Interaction feature capturing stress amplification:

$$ \text{ExplosionRisk}t = \text{UncertaintyIndex}_t \times \text{Vol}{5d} $$

This captures periods where elevated volatility coincides with negative news sentiment.


2. Modeling Approach

VolatilityForge uses an ensemble of:

  • LightGBM (efficient gradient boosting for tabular data)
  • CatBoost (robust non-linear modeling)

Final prediction:

$$ \hat{y}t = 0.6 \cdot \hat{y}{\text{LGB}} + 0.4 \cdot \hat{y}_{\text{CAT}} $$

Objective function:

$$ \text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2} $$

Time-series split (85% train, 15% validation) was used to preserve temporal structure.


Challenges I Faced

1. Regime Shifts

Market structure changes drastically between:

  • Low-volatility bull markets
  • Inflationary shocks
  • Crisis environments

Models that overfit one regime collapsed in another.

2. Public vs Private Leaderboard

Optimizing aggressively for public leaderboard often degraded stability.

Small changes in prediction variance significantly altered RMSE due to distribution shift.

This required focusing on robustness rather than leaderboard chasing.

3. Overfitting at Low RMSE

When RMSE dropped below ~0.15, even minor calibration changes worsened performance.

This taught me that:

  • Distribution alignment matters more than aggressive tuning.
  • Ensemble stability beats architectural complexity.

What I Learned

  • Financial volatility exhibits long memory.
  • Feature engineering is more powerful than model complexity in tabular finance problems.
  • Gradient boosting remains extremely strong for structured financial data.
  • Ensemble blending reduces prediction variance.
  • Stability under regime shift is more important than public leaderboard ranking.

Final Thoughts

VolatilityForge is not just a predictive model — it is a structured volatility intelligence system.

By integrating:

  • Fractal persistence
  • Realized variance modeling
  • News-based stress detection
  • Gradient boosting ensembles

the system attempts to detect turbulence before it fully manifests in price.

Markets are noisy.

But noise has structure.

VolatilityForge is built to find it.

Built With

Share this project:

Updates