Kalshi Watchdog — Prediction Market Surveillance Platform

Inspiration

Prediction markets are one of the most exciting developments in modern finance. Kalshi, the first CFTC-regulated prediction exchange, lets you trade on real-world events — elections, weather, economic data, even whether the Fed will cut rates. But with real money on the line and markets tied to world events, a natural question emerges: what does insider trading look like in a prediction market?

Traditional stock exchanges have decades of surveillance infrastructure. Prediction markets have almost none. We saw an opportunity to build a system that could catch the kinds of patterns regulators look for — volume spikes before resolution, coordinated trading bursts, and suspiciously well-timed bets on longshot outcomes — and make the results accessible through a modern dashboard with AI-generated case narratives.

The inspiration also came from real incidents: a MrBeast editor betting on video outcomes before release, a California governor candidate's associates positioning before endorsement announcements, and large coordinated bets appearing hours before military operations were publicly announced. These aren't hypotheticals — they've already happened.

What it does

Kalshi Watchdog ingests market and trade data from the Kalshi API, runs three anomaly detection algorithms, enriches findings with AI analysis (Claude 3 Haiku via AWS Bedrock), and presents everything through an interactive React dashboard.

Three Detection Algorithms

Volume Spike Detection — Bins trades into hourly buckets and flags hours where volume exceeds mean + n * standard deviation (default n = 3) across the full market history. A z-score of 4+ triggers HIGH severity. This catches unexplained pre-resolution volume surges.
Coordinated Activity Detection — Identifies bursts of trades within 5-minute clusters with high directional consistency. Calculates z-scores relative to 15-minute baseline windows. This is the signature of algorithmic order splitting or synchronized trader action.
Golden Window Detection — The most sophisticated detector. Flags trades on extreme-probability markets (YES < 5 cents or NO < 5 cents) placed within 48 hours of resolution. A scoring function combines:

Score = P(correct outcome) x Volume x (1 / time-to-resolution)

CRITICAL severity triggers when z > 4, sub-10% odds, $10K+ notional, and < 12 hours to resolution — the textbook insider trading profile.

AI-Powered Analysis

Each flagged anomaly is sent to Claude 3 Haiku via AWS Bedrock with full context: market title, anomaly type, trade count, volume, probability, hours-to-resolution, and z-score. Claude returns a structured JSON analysis with a summary, reasoning chain, severity assessment, and possible explanations. If Bedrock is unavailable, the system falls back to heuristic analysis — detection never depends on the LLM.

Dashboard Features

Force Graph — Network visualization connecting anomalies to markets and historical insider-trading case parallels
Candlestick Charts — OHLC price movement per market with anomaly overlay
Orderbook Depth — Real-time bid/ask spread visualization
Anomaly Timeline — Scatter plot of detections by time and severity
Market Inspector — Full trade-level drill-down with flagged context
Watchlist — Personal market tracking with category browsing and anomaly badges
Known Cases — 6 real/fictional insider-trading parallels for regulatory context
Admin Dashboard — User management, request analytics, and usage metrics

How we built it

Dual-Mode Architecture

The most interesting architectural decision was building a single codebase that runs both locally and on AWS with zero code changes. A single environment variable (STORAGE_BACKEND) controls the entire storage layer:

Layer	Local Mode	AWS Mode
Storage	SQLite (WAL mode)	DynamoDB (6 tables)
Real-time push	Server-Sent Events (SSE)	WebSocket via API Gateway v2
Pipeline	Manual (frontend buttons)	Step Functions + EventBridge (every 2h)
Auth	Bypass mode (`user_id = "local"`)	AWS Cognito (JWT)

Every storage function dispatches at runtime — batch_write_trades, get_anomalies, add_to_watchlist — all route to SQLite or DynamoDB through a clean abstraction in utils/dynamo.py. The local dev server (local_api.py) wraps the same Lambda handler in a ThreadingHTTPServer, so the exact same code path executes locally and in production.

AWS Services (13 total)

Service	Purpose
Lambda	8 functions: API, market/trade ingestion, detection, analysis, WebSocket connect/disconnect/broadcast
DynamoDB	6 tables: Trades, Markets, Anomalies, Connections, Watchlist, Usage
DynamoDB Streams	Triggers WebSocket broadcast on new anomaly inserts
API Gateway	REST (HTTP routes) + WebSocket v2 (real-time push)
Step Functions	Orchestrates: Ingest Markets → Ingest Trades → Run Detection
EventBridge	Scheduled rule (every 2 hours) triggers the pipeline
Bedrock	Claude 3 Haiku for AI anomaly analysis and market explanations
S3	Raw data archival with 30-day STANDARD_IA lifecycle
Cognito	User Pool with email auth, admin groups, JWT authorization
Amplify	Frontend hosting with CI/CD from GitHub
CloudWatch	Custom dashboard with Lambda metrics and DynamoDB capacity
X-Ray	Distributed tracing across all Lambda functions
SNS	CRITICAL anomaly alert notifications

Frontend

React 18 + TypeScript + Vite, styled with Tailwind CSS and animated with Framer Motion. Recharts handles all charting (candlesticks, timelines, breakdowns) and react-force-graph-2d powers the network visualization. Authentication flows through AWS Amplify's Cognito integration with protected routes and admin-only views.

Kalshi API Integration

The Kalshi client uses RSA-signed authentication — each request includes a timestamp, method, and path signed with a private key:

signature = PKCS1v15(SHA256(timestamp || METHOD || path))

The key loads from a file path locally or from a base64-encoded environment variable in Lambda, so private keys never touch version control.

Challenges we faced

1. Trade Ingestion Sequencing

The trade ingestion endpoint queries for settled markets first, then fetches trades per market. But locally, if you hit "Ingest Trades" before "Ingest Markets," the query returns zero markets and the whole pipeline silently produces nothing. We solved this with auto-market ingestion — when the trade handler detects no settled markets, it internally calls the market ingestion handler first, then retries.

2. Local/Lambda Import Compatibility

Lambda expects flat imports (from utils.kalshi_client import ...) but running locally with python -m backend.local_api puts the wrong directory on sys.path. The local server now injects backend/ onto sys.path at startup so Lambda-style imports resolve correctly in both environments.

3. DynamoDB Reset Operations

The initial reset endpoints only worked with SQLite (just delete rows). DynamoDB requires scanning the entire table to get all keys, then batch-deleting them — a fundamentally different operation. We had to implement scan-and-batch-delete for all 6 tables.

4. Real-Time Push Across Two Transports

Supporting both SSE (local) and WebSocket (AWS) from a single frontend component required careful abstraction. The LiveDetectionStream component checks for a WebSocket URL and falls back to SSE, with reconnection logic for both.

5. Golden Window False Positives

Early versions of the golden window detector flagged every cheap market near resolution. We refined the scoring function to weight probability, volume, and time-to-resolution together, and added minimum notional thresholds (\$5K for HIGH, \$10K for CRITICAL) to filter noise.

What we learned

Dual-mode architecture pays off — being able to iterate locally with SQLite and deploy to DynamoDB with zero changes made development dramatically faster
DynamoDB Streams are powerful — automatic event-driven WebSocket broadcasting with no polling or queuing infrastructure
AI enrichment vs. AI dependency — Claude 3 Haiku adds narrative context but the detection algorithms stand alone; this separation keeps the system reliable
Prediction market surveillance is an open problem — there's very little existing tooling for this; the regulatory frameworks are still being written
SAM + Step Functions make serverless orchestration manageable — the pipeline runs every 2 hours with no servers to maintain

What's next

Live market monitoring — currently processes settled markets; extending to open markets with streaming trade data
Multi-user pipelines — per-user detection configurations and custom alert thresholds
Expanded detection algorithms — wash trading detection, account clustering, cross-market correlation
Regulatory reporting — exportable case files with full evidence chains for CFTC-style submissions
Mobile alerts — push notifications via SNS when CRITICAL anomalies are detected

AI Tools Used

This project was built with assistance from AI coding tools:

Claude (Anthropic) — Primary development partner via Claude Code CLI for architecture design, full-stack implementation, debugging, and this writeup
OpenAI Codex — Code generation and iteration assistance
Kiro (AWS) — AI-powered IDE for AWS infrastructure development
Claude 3 Haiku (AWS Bedrock) — Powers the in-app AI anomaly analysis and market explanation features

Built with

Python, TypeScript, React, AWS Lambda, DynamoDB, API Gateway, Step Functions, EventBridge, AWS Bedrock (Claude 3 Haiku), Cognito, Amplify, S3, CloudWatch, X-Ray, SNS, Tailwind CSS, Recharts, Framer Motion, Vite, SAM

Built With

amplify
api-gateway
aws-bedrock-(claude-3-haiku)
aws-lambda
cloudwatch
cognito
dynamodb
eventbridge
framer-motion
python
react
recharts
s3
sns
step-functions
tailwind-css
typescript
vite
x-ray

Updates

Stuart Cohen started this project — Mar 20, 2026 04:42 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.