Martingale SPY NLP Trading Signals

An end-to-end machine learning system that transforms unstructured financial text (news, analyst reports, social media) into predictive trading signals for the S&P 500 (SPY), optimized for risk-adjusted returns.

Architecture

Phase 1: Feature Engineering

  • Price Features: Daily returns, volatility (10d, 20d SMA), momentum, trend indicators
  • Text Features: TF-IDF vectorization, sentiment analysis (positive/negative word counts)
  • Data Integration: Merged financial texts with daily OHLCV data

Phase 2: Model Development

  • Algorithm: LightGBM regression with time-series cross-validation (3-fold TimeSeriesSplit)
  • Cross-Validation: Temporal CV respects data chronology to avoid look-ahead bias
  • Metrics: RMSE ≈ 0.01, average Sharpe ratio ≈ 0.45-0.65 across folds

Phase 3: Trading Logic & Risk Management

  • Signal Generation: Threshold-based (±1% predicted return)
  • Risk Overlay: Volatility scaling (reduce position size in high-volatility regimes)
  • Sharpe Improvement: Risk overlay improved Sharpe ratio by 0.3-0.4 points

Key Results

  • Combined text+price features outperform individual signals
  • Volatility-aware position sizing reduces tail risk without sacrificing returns
  • Strategy avoids leaderboard overfitting through disciplined cross-validation

Links

Final Submission Results

Kaggle Notebook Status: ✓ Successfully Executed

  • All 6 sections completed without errors
  • Submission file created: submission.csv (1865 records)

Model Performance (Time-Series CV):

  • Average RMSE: 0.0116
  • Average Sharpe Ratio: 1.0567
  • Win Rate: 52.87%

Trading Signal Distribution:

  • Buy Signals: 662
  • Sell Signals: 707
  • Hold Signals: 496

Submission Ready: Yes ✓

Built With

Share this project:

Updates