Inspiration

We were inspired by Philip Tetlock's superforecasting methodology. We realized that humans are great at aggregating slow-moving information (which underlies prediction markets like Polymarket or Kalshi), but they can be slow to react to breaking news or obscure events. We wanted to build an AI agent that takes the best of both worlds: trusting the "wisdom of the crowd" as a baseline, but acting instantly when new, decisive evidence appears on the web.

How we built it

Our agent architecture revolves around three key innovations:

  1. Market-Anchoring Prompt Strategy: Instead of guessing 50/50 when uncertain, our agent is strictly instructed to first search for the live odds of the event on Polymarket, Kalshi, or other prediction markets. It uses this market probability as its baseline, ensuring it never loses to the crowd due to ignorance. It only deviates when it finds concrete evidence that the market hasn't priced in.
  2. Two-Pass Web Grounding: We utilized OpenRouter's native :online plugin to give our model (Gemini 2.5 Flash) real-time web access. The agent searches for the event, parses news articles and official results, and then reasons over them before committing to a probability.
  3. Adaptive "Smart Cache": To survive continuous 2-week evaluation without blowing through API credits, we built an intelligent caching layer in FastAPI. The TTL (Time-To-Live) adapts to the event's urgency: events closing weeks from now are cached for 12 hours, events closing today are cached for 1 hour, and closed events are cached for 30 minutes. This guarantees a rapid response time (<10ms for cache hits) while perfectly protecting our budget.

Challenges we ran into

Our biggest hurdle was LLM output parsing when connected to the web search. The web search plugin would often cause the model to inject markdown links (like [source.com](URL)) directly into its JSON output, breaking standard Python JSON parsers. It would sometimes wrap the JSON in code fences or forget to output JSON entirely after a long analytical reasoning chain.

We solved this by building a robust 5-strategy regex parser. If all parsing strategies fail, the agent triggers an automatic, low-cost retry mechanism: it sends its own text analysis back to the LLM (without web search) and explicitly commands it to just extract the probability into pure JSON.

What we learned

  • Prompt Engineering is not enough for forecasting. In our initial tests, tweaking the prompt to be "smarter" barely moved our Brier score. Real-time web search integration dropped our Brier score by 78% instantly.
  • The Market is hard to beat. We learned that the safest way to forecast an obscure event you know nothing about is to simply find out what the betting markets think and copy them.

Built With

  • fastapi
  • google/gemini-2.5-flash
  • openrouter
  • pydantic
  • python
  • regex
  • uvicorn
Share this project:

Updates