Inspiration
Prediction markets are becoming more popular by the day. Platforms like Kalshi are being framed as some of the most accurate forecasting tools we have, turning the opinions of thousands of traders into real-time probabilities on everything from election outcomes to the weather. But the deeper I looked into how they actually work, I found out that there’s basically no market surveillance infrastructure. Traditional finance markets have surveillance systems to catch illegal activity. That kind of system doesn't exist yet on prediction markets, which in turn can create clear attack surfaces. A trader can place a large fake order to move the market, cancel it, and profit. Someone with early access to information can build a position before the news breaks. And right now, there’s no system consistently watching for this.
What it does
KalshiWatch is an AI-powered surveillance agent for prediction markets. It monitors markets and flags three types of risk:
Spoofing: Detects “ghost orders”, which are large bids that appear to create fake demand, then disappear without executing. The system tracks order book changes, checks for fills, and scores behavior based on size, speed, and repetition.
Insider Signals: Flags price moves that don’t align with public news. It identifies sharp price changes, looks for related news in the same window, and measures delay. Large moves without immediate news are treated as suspicious.
Resolution Anomalies: Audits resolved markets for outcomes that contradict prior pricing. If a contract trades near 20% and resolves YES, the system flags the deviation and tracks patterns across similar cases.
How we built it
I built KalshiWatch using LangGraph. Each market flows through a fixed sequence of analysis steps: scanning, spoof detection, insider signal detection, resolution auditing, and report generation. I chose a structured pipeline instead of a fully agentic system because coverage and consistency matter more than flexibility in a surveillance context, since every market needs to be evaluated against every rule, every time. Each detector is rule-based and explicitly scored, modeled on real-world enforcement patterns. Every signal is tied to a clear condition and contributes a defined amount to a final risk score. Once all signals are collected, an LLM is used as a synthesis layer rather than a decision-maker. It takes structured evidence like price movements, order behavior, timing gaps, and turns it into a full investigation report. The goal isn’t to detect fraud with the model, but to translate these signals into something a human can quickly understand and act on.
For insider signal detection, I integrated GDELT to correlate price movements with global news coverage in near real time. The key metric is timing: how long the market moves before an explanation appears publicly.
The system runs on a data layer that mirrors Kalshi’s API, with the ability to switch between controlled mock data and live data without changing the interface. That made it possible to validate detection logic across specific scenarios while keeping the architecture production-ready.
Challenges we ran into
Getting the scoring system was a bit tricky. Early on, the system was too binary, so everything either looked very suspicious or very clean. I had to figure out how to allocate weights and what carried more importance, so that all cases would be classified correctly, especially cases that fell in the middle of clean and suspicious.
Accomplishments that we're proud of
I'm proud of how the system captures actual manipulation problems, which prediction markets are currently dealing with, especially the delayed news signals. I'm also happy with how the reporting feature turned out, so instead of numbers, there's a written explanation for why a market was flagged and why it received the score it did.
What we learned
The biggest thing I learned is how important good test data is. It's a core part of the system and without it, there are tons are edge cases that could be missed.
What's next for KalshiWatch
The next step is implementing real-time streaming, so order book changes can be analyzed as they happen instead of in intervals. From there, I would want to focus on account-level analysis, so tracking which participants are repeatedly involved in suspicious activity across markets.
Built With
- gdelt
- kalshi
- python
- streamlit
Log in or sign up for Devpost to join the conversation.