Folaber_bot

Inspiration

I'm a labor economist working on AI's impact on labor markets. I wanted to see how far an LLM agent could go on real-world forecasting, and where it would fail.

What it does

Takes a binary or multi-outcome event from Kalshi (e.g., "Will the Fed cut rates in June?") and returns a probability for each outcome. Handles three event types: binary, exclusive multi-outcome, and cumulative thresholds.

How I built it

predict(event) Python function — the agent
Claude Sonnet 4.6 via OpenRouter with :online web search
Category-aware system prompt covering Sports, Elections, Politics, Entertainment, Economics
Defensive JSON parsing, probability clipping to [0.02, 0.98], normalization to sum to 1
FastAPI wrapper, deployed on Modal as an always-warm container

What I learned

Retrieval matters more than model size. Appending :online was a one-line change that grounded predictions in current information instead of stale training data.
Plumbing-to-intelligence ratio was about 20:1. Most of the time went to environment setup, schema mismatches, and deployment — not to the actual prediction logic.
Calibration is the open problem. LLMs default to confident; teaching them to output 0.5 on genuine unknowns is harder than it sounds.

Challenges

The installed CLI rejected my predictions due to a schema mismatch with the official docs.
response_format={"type": "json_object"} broke Claude via OpenRouter and produced hallucinated templates.
Cumulative threshold events aren't mathematically "sum to 1" events but the spec requires it.
ngrok email verification at the submission deadline.
:online mode costs ~$0.36/call — over the $50 budget at full 200 calls.

What's next

Per-category prompts and routing, ensembling across Claude / GPT / Gemini, calibration corrections from a held-out validation set, and a writeup for the ICML 2026 workshop on forecasting.

Built With

anthropic
claude
fastapi
github
modal
ngrok
openai-sdk
openrouter
pydantic
python

Updates

Yong Lee started this project — May 17, 2026 07:19 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.