Kalshi Watchdog — Prediction Market Surveillance Platform

Inspiration

Prediction markets are one of the most exciting developments in modern finance. Kalshi, the first CFTC-regulated prediction exchange, lets you trade on real-world events — elections, weather, economic data, even whether the Fed will cut rates. But with real money on the line and markets tied to world events, a natural question emerges: what does insider trading look like in a prediction market?

Traditional stock exchanges have decades of surveillance infrastructure. Prediction markets have almost none. We saw an opportunity to build a system that could catch the kinds of patterns regulators look for — volume spikes before resolution, coordinated trading bursts, and suspiciously well-timed bets on longshot outcomes — and make the results accessible through a modern dashboard with AI-generated case narratives.

The inspiration also came from real incidents: a MrBeast editor betting on video outcomes before release, a California governor candidate's associates positioning before endorsement announcements, and large coordinated bets appearing hours before military operations were publicly announced. These aren't hypotheticals — they've already happened.

What it does

Kalshi Watchdog ingests market and trade data from the Kalshi API, runs three anomaly detection algorithms, enriches findings with AI analysis (Claude 3 Haiku via AWS Bedrock), and presents everything through an interactive React dashboard.

Three Detection Algorithms

  1. Volume Spike Detection — Bins trades into hourly buckets and flags hours where volume exceeds mean + n * standard deviation (default n = 3) across the full market history. A z-score of 4+ triggers HIGH severity. This catches unexplained pre-resolution volume surges.

  2. Coordinated Activity Detection — Identifies bursts of trades within 5-minute clusters with high directional consistency. Calculates z-scores relative to 15-minute baseline windows. This is the signature of algorithmic order splitting or synchronized trader action.

  3. Golden Window Detection — The most sophisticated detector. Flags trades on extreme-probability markets (YES < 5 cents or NO < 5 cents) placed within 48 hours of resolution. A scoring function combines:

Score = P(correct outcome) x Volume x (1 / time-to-resolution)

CRITICAL severity triggers when z > 4, sub-10% odds, $10K+ notional, and < 12 hours to resolution — the textbook insider trading profile.

AI-Powered Analysis

Each flagged anomaly is sent to Claude 3 Haiku via AWS Bedrock with full context: market title, anomaly type, trade count, volume, probability, hours-to-resolution, and z-score. Claude returns a structured JSON analysis with a summary, reasoning chain, severity assessment, and possible explanations. If Bedrock is unavailable, the system falls back to heuristic analysis — detection never depends on the LLM.

Dashboard Features

  • Force Graph — Network visualization connecting anomalies to markets and historical insider-trading case parallels
  • Candlestick Charts — OHLC price movement per market with anomaly overlay
  • Orderbook Depth — Real-time bid/ask spread visualization
  • Anomaly Timeline — Scatter plot of detections by time and severity
  • Market Inspector — Full trade-level drill-down with flagged context
  • Watchlist — Personal market tracking with category browsing and anomaly badges
  • Known Cases — 6 real/fictional insider-trading parallels for regulatory context
  • Admin Dashboard — User management, request analytics, and usage metrics

How we built it

Dual-Mode Architecture

The most interesting architectural decision was building a single codebase that runs both locally and on AWS with zero code changes. A single environment variable (STORAGE_BACKEND) controls the entire storage layer:

Layer Local Mode AWS Mode
Storage SQLite (WAL mode) DynamoDB (6 tables)
Real-time push Server-Sent Events (SSE) WebSocket via API Gateway v2
Pipeline Manual (frontend buttons) Step Functions + EventBridge (every 2h)
Auth Bypass mode (user_id = "local") AWS Cognito (JWT)

Every storage function dispatches at runtime — batch_write_trades, get_anomalies, add_to_watchlist — all route to SQLite or DynamoDB through a clean abstraction in utils/dynamo.py. The local dev server (local_api.py) wraps the same Lambda handler in a ThreadingHTTPServer, so the exact same code path executes locally and in production.

AWS Services (13 total)

Service Purpose
Lambda 8 functions: API, market/trade ingestion, detection, analysis, WebSocket connect/disconnect/broadcast
DynamoDB 6 tables: Trades, Markets, Anomalies, Connections, Watchlist, Usage
DynamoDB Streams Triggers WebSocket broadcast on new anomaly inserts
API Gateway REST (HTTP routes) + WebSocket v2 (real-time push)
Step Functions Orchestrates: Ingest Markets → Ingest Trades → Run Detection
EventBridge Scheduled rule (every 2 hours) triggers the pipeline
Bedrock Claude 3 Haiku for AI anomaly analysis and market explanations
S3 Raw data archival with 30-day STANDARD_IA lifecycle
Cognito User Pool with email auth, admin groups, JWT authorization
Amplify Frontend hosting with CI/CD from GitHub
CloudWatch Custom dashboard with Lambda metrics and DynamoDB capacity
X-Ray Distributed tracing across all Lambda functions
SNS CRITICAL anomaly alert notifications

Frontend

React 18 + TypeScript + Vite, styled with Tailwind CSS and animated with Framer Motion. Recharts handles all charting (candlesticks, timelines, breakdowns) and react-force-graph-2d powers the network visualization. Authentication flows through AWS Amplify's Cognito integration with protected routes and admin-only views.

Kalshi API Integration

The Kalshi client uses RSA-signed authentication — each request includes a timestamp, method, and path signed with a private key:

signature = PKCS1v15(SHA256(timestamp || METHOD || path))

The key loads from a file path locally or from a base64-encoded environment variable in Lambda, so private keys never touch version control.

Challenges we faced

1. Trade Ingestion Sequencing

The trade ingestion endpoint queries for settled markets first, then fetches trades per market. But locally, if you hit "Ingest Trades" before "Ingest Markets," the query returns zero markets and the whole pipeline silently produces nothing. We solved this with auto-market ingestion — when the trade handler detects no settled markets, it internally calls the market ingestion handler first, then retries.

2. Local/Lambda Import Compatibility

Lambda expects flat imports (from utils.kalshi_client import ...) but running locally with python -m backend.local_api puts the wrong directory on sys.path. The local server now injects backend/ onto sys.path at startup so Lambda-style imports resolve correctly in both environments.

3. DynamoDB Reset Operations

The initial reset endpoints only worked with SQLite (just delete rows). DynamoDB requires scanning the entire table to get all keys, then batch-deleting them — a fundamentally different operation. We had to implement scan-and-batch-delete for all 6 tables.

4. Real-Time Push Across Two Transports

Supporting both SSE (local) and WebSocket (AWS) from a single frontend component required careful abstraction. The LiveDetectionStream component checks for a WebSocket URL and falls back to SSE, with reconnection logic for both.

5. Golden Window False Positives

Early versions of the golden window detector flagged every cheap market near resolution. We refined the scoring function to weight probability, volume, and time-to-resolution together, and added minimum notional thresholds (\$5K for HIGH, \$10K for CRITICAL) to filter noise.

What we learned

  • Dual-mode architecture pays off — being able to iterate locally with SQLite and deploy to DynamoDB with zero changes made development dramatically faster
  • DynamoDB Streams are powerful — automatic event-driven WebSocket broadcasting with no polling or queuing infrastructure
  • AI enrichment vs. AI dependency — Claude 3 Haiku adds narrative context but the detection algorithms stand alone; this separation keeps the system reliable
  • Prediction market surveillance is an open problem — there's very little existing tooling for this; the regulatory frameworks are still being written
  • SAM + Step Functions make serverless orchestration manageable — the pipeline runs every 2 hours with no servers to maintain

What's next

  • Live market monitoring — currently processes settled markets; extending to open markets with streaming trade data
  • Multi-user pipelines — per-user detection configurations and custom alert thresholds
  • Expanded detection algorithms — wash trading detection, account clustering, cross-market correlation
  • Regulatory reporting — exportable case files with full evidence chains for CFTC-style submissions
  • Mobile alerts — push notifications via SNS when CRITICAL anomalies are detected

AI Tools Used

This project was built with assistance from AI coding tools:

  • Claude (Anthropic) — Primary development partner via Claude Code CLI for architecture design, full-stack implementation, debugging, and this writeup
  • OpenAI Codex — Code generation and iteration assistance
  • Kiro (AWS) — AI-powered IDE for AWS infrastructure development
  • Claude 3 Haiku (AWS Bedrock) — Powers the in-app AI anomaly analysis and market explanation features

Built with

Python, TypeScript, React, AWS Lambda, DynamoDB, API Gateway, Step Functions, EventBridge, AWS Bedrock (Claude 3 Haiku), Cognito, Amplify, S3, CloudWatch, X-Ray, SNS, Tailwind CSS, Recharts, Framer Motion, Vite, SAM

Built With

  • amplify
  • api-gateway
  • aws-bedrock-(claude-3-haiku)
  • aws-lambda
  • cloudwatch
  • cognito
  • dynamodb
  • eventbridge
  • framer-motion
  • python
  • react
  • recharts
  • s3
  • sns
  • step-functions
  • tailwind-css
  • typescript
  • vite
  • x-ray
Share this project:

Updates