Inspiration
We asked a simple question: what's the one thing LLMs are actually good at that we can use in trading? Not making split-second decisions; that's what execution engines are for. But writing strategies? Explaining logic? Learning from what worked last time? That's where language models shine.
So we flipped the problem. Instead of an LLM yelling "BUY NOW," we made it a quant writing Python code. A simple engine runs the code. The model never touches money directly. It just keeps getting better at writing strategies that actually make money.
What it does
A trading system where an AI research team writes strategies, we test them on real price history, and only deploy the ones that work. The team improves itself over time by evolving the strategies that already proved profitable, kind of like selective breeding for trading algorithms.
You can watch it run on historical data at 10× speed in two minutes, or run it live against actual market prices. Either way, it's trading in real Python, not a black box.
How we built it
We squeezed inference latency down so the model can write strategies fast enough to keep up with market ticks. Every proposal gets backtested before it goes live. If it doesn't make money on historical data, it never touches real capital. We built an evolutionary system where profitable strategies breed and mutate into better versions. A React dashboard shows everything in real-time as it happens.
Challenges we ran into
Our initial latency was 3.5 seconds per strategy, which was way too slow. We optimized it down to around 700ms through prefix caching, speculative decoding, and quantization. The tricky part was making sure prefix caching still worked when we evolved strategies using GEPA; we had to reuse the exact same system prompt across mutations. Pareto pruning turned out to be O(n²), so we capped the pool size at 20 to keep it fast. We also had to handle the fact that vLLM consumes stop tokens, so our regex has to accept truncated responses.
Accomplishments we're proud of
We achieved a 5× latency improvement through real systems work like caching, quantization, and smart batching. The evolutionary system actually works — the gene pool improves each round and pushes the Pareto front outward on both return and drawdown simultaneously. The architecture is clean and production-ready: the LLM never touches execution, every deployed strategy is auditable Python, and the backtest filter ensures only strategies with demonstrated edge ever go live.
What we learned
Prefix caching is transformative when you have repetitive prompts. Speculative decoding works better than expected on short outputs like trading strategies. Language model mutations beat random perturbation because the model understands semantic changes. Only historical price data tells the truth about whether a strategy works. GPU memory utilization, not latency, is the real bottleneck in production systems.
What's next for $LLM
We want to persist the gene pool across sessions so it accumulates knowledge over time. We'd like to support multiple assets and timeframes simultaneously. We could add risk-adjusted Pareto selection using metrics like Sharpe and Calmar ratios. We plan to distribute inference across multiple GPU nodes so throughput scales linearly. And eventually, we want to deploy real capital on actual brokers and see if this works with genuine market conditions.
Log in or sign up for Devpost to join the conversation.