Inspiration
The future is unpredictable, but with the rise of Large Language Models, we now have tools that can process vast amounts of information to find patterns and signals in noise. We were inspired by the concept of "superforecasting" — the ability to systematically break down complex geopolitical, technological, and economic questions into calibrated probabilities. Our goal was to build an autonomous agent that doesn't just guess, but actively researches and reasons before making a prediction.
What it does
Our AI agent is built for the Forecasting Track of Prophet Hacks. It exposes an OpenAI-compatible HTTP endpoint that takes a batch of real-world forecasting questions. For each question, the agent autonomously queries the live web for the most recent news and context, feeds this real-time data along with a specialized superforecasting framework into GPT-4o, and returns a finely calibrated probability score between 0.0 and 1.0.
How we built it
We developed the entire architecture online using a modern, lightweight, and scalable stack:
- Backend & API: Built with Python and FastAPI to handle fast, concurrent asynchronous HTTP requests from the evaluation harness.
- Intelligence & Reasoning: Powered by OpenAI's GPT-4o API, using low temperature settings ($temperature = 0.1$) for highly stable and deterministic reasoning.
- Real-time RAG (Search): Integrated with the Tavily AI Search API to fetch up-to-date news, timelines, and base rates from the live web.
- Deployment: Maintained version control via GitHub, coded in the cloud using GitHub Codespaces, and deployed 24/7 on Render serverless infrastructure.
Challenges we ran into
Time was our biggest constraint, pushing us to work entirely in cloud environments. During deployment, we faced sudden environment compatibility issues where the latest Python 3.14 alpha build on the hosting platform broke core dependencies like pydantic-core and caused TypeError conflicts within the OpenAI client connection. We successfully mitigated this under high pressure by dynamically overriding the runtime variables to pin a stable Python 3.11 environment and refactoring our dependencies block.
Accomplishments that we're proud of
- Successfully went from zero to a fully deployed live cloud API agent within the hackathon window.
- Built a working Retrieval-Augmented Generation (RAG) loop that allows an LLM to accurately ground its predictions in real 2026 events.
- Ensured full compatibility with the custom OpenAI-like specification required by the evaluation harness.
What we learned
We learned a massive amount about API design, the mechanics of building autonomous web-searching loops, and how critical exact environment tracking is when deploying serverless Python applications. We also gained deep insights into prompt engineering techniques required to force LLMs into producing mathematically calibrated forecast percentages instead of vague text summaries.
What's next for AI Agent
We plan to upgrade the agent from a single-prompt model into an advanced multi-agent consensus system. The next iteration will feature an adversarial architecture: one agent will search for arguments supporting the event, another will build a case against it, and a third mathematical "aggregator" agent will synthesize their findings using a Brier-score optimization algorithm to produce the ultimate prediction.

Log in or sign up for Devpost to join the conversation.