APEX Quant-Forge AI Agent

Typing a natural language trading strategy into the APEX Quant-Forge workspace.
Enabling the on-the-fly Machine Learning Alpha Filter and configuring the AI confidence threshold
The AI Analyst dynamically parses the plain English prompt into structured parameters.
Reviewing the AI's extracted logic and conditions before approving the backtest execution.
Live terminal logs tracking the agents as they write, sanitize, and execute the Python code in a safe sandbox.
Interactive Plotly charts mapping the strategy's historical equity curve and core performance metrics.
Visualizing the maximum drawdown timeline to instantly understand portfolio volatility and risk exposure.
A granular audit log of every execution, highlighting high-risk trades safely blocked by the ML risk manager.
No black boxes: Viewing the Random Forest feature weights and expanding the production-ready Python script for export

Inspiration

Quantitative finance is no longer just about linear statistics; it has evolved into a high-stakes, data-driven domain where speed, security, and complex machine learning decide profitability. However, building, backtesting, and validating a quantitative strategy requires writing hundreds of lines of boilerplate code, managing shifting yfinance data structures, and manually running risk filters.

We built APEX Quant-Forge to bridge the gap between human intuition and institutional-grade execution. By orchestrating a collaborative, multi-agent network, we wanted to empower traders to speak their trading ideas in plain English, and watch as an autonomous network of specialized AI agents writes, validates, self-heals, refines, and tests those strategies against real market data—all in a secure, high-performance sandbox.

What it does

APEX Quant-Forge is an advanced, multi-agent algorithmic backtesting platform. The system operates through a structured state-machine pipeline:

Intent Analysis & Parameter Extraction: Parses natural language prompts to extract tickers, strategy rules, indicators, and date ranges.
AST Code Sanitation: Runs generated Python strategy scripts through an Abstract Syntax Tree (AST) validator to block unsafe commands (such as sys, os, or socket operations) before execution.
Isolated Sandbox Execution: Runs the code in an isolated subprocess, fetching data, running vectorised backtests, and outputting trade metrics.
Self-Healing Critic: If a script fails inside the sandbox, a recursive Critic node parses the traceback, identifies syntax/indentation errors, and auto-patches the code.
Machine Learning Risk Filter: Dynamically trains a local Random Forest Classifier on technical features (RSI, MACD, Volatility) to classify trades, blocking high-risk entries as ML_FILTERED.
Parameter Optimization: Evaluates holding periods and indicator windows, automatically rejecting underperforming parameters to maximize Sharpe Ratio.

How we built it

We engineered APEX Quant-Forge using a state-of-the-art developer stack optimized for performance and resilience:

Orchestration: Built on a state-driven agent network using LangGraph and LangChain.
Frontend: Responsive, dark-themed dashboard styled using custom CSS inside Streamlit.
Resilient Routing Proxy: Built a multi-key, 5-provider router supporting Groq, Gemini, Cerebras, OpenRouter, and Mistral with automatic fallback rotation to combat rate limits (429 errors).
Database: Persistent local trade tracking and run histories managed via SQLite.
ML Core: Implemented with Scikit-Learn for the Random Forest Classifier.
Math & Charts: Built with Pandas, NumPy, and interactive Plotly timelines.
Deployment: Containerized using a custom Dockerfile and deployed to Hugging Face Spaces.

Quantitative Formulations Used

We injected dynamic mathematical calculations into the sandbox output:

Sharpe Ratio measures risk-adjusted return: $$ Sharpe = \frac{\overline{R}_p - R_f}{\sigma_p} $$ Where $\overline{R}_p$ is the mean asset return, $R_f$ is the risk-free rate, and $\sigma_p$ is the standard deviation of returns.
Sortino Ratio evaluates downside risk specifically: $$ Sortino = \frac{\overline{R}p - R_f}{\sigma_d} $$ Where $\sigma_d$ is the downside deviation of negative returns: $$ \sigma_d = \sqrt{\frac{1}{N} \sum{i=1}^{N} \min(0, R_i - R_f)^2} $$

Challenges we ran into

Subprocess Startup Latency: Spawning python subprocesses for sandbox code execution originally took up to 25.5 seconds due to library import overhead (pandas, numpy, sklearn). We solved this by pre-warming a daemon process in memory, achieving a 25x acceleration in backtest turnaround.
API Rate Limiting (429s): High rate limits during testing would crash the workflow. We engineered a 12-key, 5-provider failover routing proxy that seamlessly rotates API keys when rate limits or timeouts are detected.
Structural Data Shift: Structural changes in the yfinance data format returned Multi-Indexed DataFrames that broke traditional slicing. We added a programmatic column-flattening step directly inside our Coder Agent instructions.
Indentation Errors in LLM Code: Code injected dynamically with mathematical stats frequently broke due to whitespace errors. We solved this by writing an AST parser and a rfind() code injector that safely finds the main block indentation level before appending metrics calculations.

Accomplishments that we're proud of

Robust Self-Healing Pipeline: Seeing the Critic agent capture a sandbox traceback, identify the exact broken line, patch the code, and successfully rerun the backtest without any user intervention.
Machine Learning Integration: Successfully integrating a local Random Forest classifier that dynamically gates trades in real-time, proving that AI filters can protect portfolio drawdown.
Aesthetic Excellence: Crafting a dark, cohesive dashboard theme that makes terminal streaming logs, ML feature importances, and interactive Plotly curves look institutional-grade.
Zero-Configuration Demo Mode: Implementing a relaxed API guard that lets users try the app instantly via a cached mock fallback system if no API keys are provided.

What we learned

AST Parsing is Crucial: Relying purely on LLM syntax instructions is not enough for sandbox safety. Enforcing strict AST validation is the only way to secure code execution.
Orchestration Trumps Size: Smaller, specialized models (like Llama 3.1 8B on Cerebras or Gemini 2.5 Flash) organized in a cooperative network (LangGraph) outperform single large models trying to do everything at once.
Pre-Warming Boosts UX: Users expect instant feedback. Pre-loading heavy data libraries in background memory turns a sluggish application into a premium, responsive experience.

What's next for APEX Quant-Forge AI Agent

Live Broker Integrations: Connecting to paper-trading APIs (like Alpaca or Interactive Brokers) to execute generated strategies in real-time.
Multi-Asset Portfolio Rebalancing: Moving beyond single-ticker testing to allow the agent to manage and rebalance a diversified index portfolio.
Higher Frequency Options & Crypto Connectors: Integrating WebSocket streams to capture sub-second options pricing data and high-volatility crypto markets.
Mobile Webhook Alerts: Building automated SMS/Telegram alert webhooks to notify users when the ML filter blocks a trade or when a strategy triggers exit conditions.

Built With

cerebras
gemini
groq
langchain
langgraph
llama-3.3
mistral
numpy
openrouter
pandas
plotly
python
python-dotenv
scikit-learn
sqlite
streamlit
yfinance

Updates

Sai Ghodke started this project — May 19, 2026 09:14 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.