Inspiration

In modern financial markets, alpha decays in milliseconds. The moment a major news headline hits the wire, whether it's a Federal Reserve rate decision or an injury in a college basketball game, the market reacts before a human can even reload the page.

We realized that Prediction Markets (like Kalshi) are fundamentally different from stock markets. They trade on both sentiment but also on definitive real-world events. This makes them the perfect playground for rapid, medium to high frequency Natural Language Processing to take advantage of that human reaction time edge.

What it does

Kalshi Algorothmic Trading is an autonomous, high to medium frequency news trading bot designed specifically for the Kalshi prediction market.

It can:

  1. Ingest Real-Time Data: It polls RSS feeds and news APIs for breaking headlines.
  2. Filter the Noise: It uses a high-speed entity linker to map vague headlines (e.g., "Powell speaks") to specific Kalshi market tickers (e.g., FED-RATE).
  3. Quantify Sentiment: It runs FinBERT, a financial sentiment model, to generate a sentiment score.
  4. Execute Logic: Using our own proprietary equation to determine whether or not a Headline is worth pursuing.
  5. Add Reasoning. Adding a reasoning layer allows it to rationalize past the "high sentiment good, low sentiment bad" trap that is common to NLP.
  6. Trade: It executes limit orders via the Kalshi API, managing a portfolio in real-time.

The "Hybrid" Architecture

To achieve sub-second latency while running heavy transformer models, we built a hybrid cloud/local architecture powered by Modal. We realized that running LLMs locally was too slow, and standard APIs had too much network overhead.

1. The "Orchestrator" (Local)

The core of the system is a lightweight Python event loop that acts as the traffic controller.

  • Latency Budget: <10ms per tick.
  • Responsibility: It polls RSS feeds/APIs and preprocesses them for our Modal instances

2. The "Brain" (Serverless GPU on Modal)

When the Orchestrator detects a potential signal, it dispatches the payload to Modal.

  • Linking on Modal: We use a pretrained vector similarity search and Modal's fast compute to easily compare Headlines to Markets and determine the most optimal fit.
  • FinBERT on Modal: We deploy the ProsusAI/finbert model as a serverless Modal function. It spins up in milliseconds, scores the sentiment (Positive/Negative/Neutral), and returns the vector.
  • LLM Reasoning: For high-confidence headlines, we trigger a second Modal function that hosts a quantized reasoning model This agent:
    1. Ingests the specific Kalshi contract rules (e.g., "Fed Funds Rate > 5.5%").
    2. Ingests the news content.
    3. Outputs a binary YES/NO decision with a confidence interval to see past basic NLP

3. The Execution Layer

Once Modal returns the signal, the local Orchestrator takes over:

  • Risk Engine: Checks portfolio exposure.
  • Order Book Analysis: We verify liquidity on Kalshi to ensure we don't get slipped.
  • Execution: Orders are placed via the Kalshi API using limit orders to guarantee price.

Why Modal?

Building on Modal was the critical unlock for this project that turned it from multiple second pipelines to extremely fast medium frequency trades:

  • Zero Infrastructure: We didn't have to manage Docker containers or Kubernetes clusters.
  • Instant Scale: If 50 news stories break at once (e.g., during an FOMC meeting), Modal spins up 50 parallel GPU containers to process them simultaneously.
  • Environment Isolation: We could run conflicting Python dependencies (PyTorch for FinBERT vs. vector similarity) in separate remote environments seamlessly.

Important Information

  • Performance Metrics: We achieved a 99% noise filtration rate, processing irrelevant articles in 8ms or less. We also were able average out our latency to 50ms.
  • End-to-End Automation: The system runs completely hands-free, including a background "Heartbeat" thread that monitors our open positions and liquidates them if risk parameters are breached.
  • Robustness: The system handles API rate limits, connection drops, and empty news feeds gracefully without crashing.

Built With

Share this project:

Updates