PredCal
PredCal is an open-source calibration engine that measures how well prediction markets are calibrated across platforms like Polymarket and Kalshi, using a shared quality layer and harmonized category analysis.
Problem
Prediction markets settle thousands of contracts across platforms like Polymarket and Kalshi, but there is no unified, reproducible way to measure how well-calibrated those markets actually are.
Today, most calibration analysis is ad hoc and not comparable because:
each platform uses different category taxonomies, contract structures, and settlement conventions, raw samples are polluted by low-information near-extreme contracts and singleton event ladders, and there is no standard quality-adjustment methodology that works across sources. Solution / What it does PredCal solves this with a three-layer pipeline:
Multi-source ingestion
Fetches resolved markets from Polymarket and Kalshi. Normalizes every record into a common schema while preserving source-native metadata. Shared market-quality layer
Scores each record with a source-agnostic quality metric based on information content and cohort density. Uses a ladder-aware top-k cohort policy to prune low-information singletons while rescuing analytically useful repeated ladders. Harmonized analysis and reporting
Maps raw platform labels into stable harmonized categories like Politics, Finance, Sports, Crypto, and Climate & Weather. Computes raw, quality-weighted, quality-filtered, and balanced companion views. Exposes everything through a read-only JSON API.
Key features
Cross-platform ingestion for resolved prediction markets Shared quality scoring layer across sources Harmonized category mapping across raw venue labels Calibration bins and Brier scoring by source and category Machine-readable artifact trail (normalized.jsonl, quality.jsonl, summary.json) Read-only API endpoints for health, summary, calibration, quality, anomalies, tracker, and reports Operator tooling for policy comparison, gap diagnosis, singleton analysis, and harmonization inspection What makes it different Quality-first, not volume-first — every record is scored and surfaced with the quality layer, so the analysis is built on signal, not just count inflation. Cross-platform by design — the harmonization and quality layers work the same way across sources. Reproducible and auditable — every run writes machine-readable artifacts plus human-readable reports. Operator tooling included — PredCal ships with diagnostics and policy-tuning tools, not just a pipeline.
Validation / results
12 tracked validation runs across sample sizes from roughly 180 to 491 records 11 harmonized categories collapsed from 36+ raw platform labels across two sources Filtered Brier scores in the 0.03–0.06 range on filtered views Stable quality-policy default (topk_cohort) validated against a legacy threshold baseline using same-data rebuild comparisons Tech stack / built with Python 3.11+ Polymarket CLOB API Kalshi API Lightweight built-in HTTP JSON API Zerve as the intended notebook / deployment environment
API endpoints
/health /summary /calibration /quality /anomalies /tracker /report /calibration-report /api/calibration-report
Challenges
Cross-platform prediction markets do not share one clean taxonomy or contract format. Raw counts are distorted by low-information near-extreme contracts and repeated ladders from one event. The hardest part was finding a quality policy that stayed stable across older, mid, and recent regimes instead of overfitting one sample.
Accomplishments
Built a working multi-source calibration engine with reproducible artifacts Designed and validated a shared quality layer instead of relying on source-specific heuristics alone Added harmonization, diagnostics, same-data rebuild comparisons, and API/report surfaces Reached a stable default quality policy with explicit receipts and comparison tooling
What I learned
Calibration analysis is much more sensitive to sample construction than naive raw-count metrics suggest. A reusable quality layer matters more than one-off per-source cleanup. The best way to trust methodology changes is same-data rebuilds plus durable comparison receipts.
What's next
Deploy the project in a stronger notebook / interactive environment Keep expanding source coverage when authentication paths are available Improve the paid service wrapper and deployment story so agents can buy calibration reports before acting on market signals
Team
Built by Markeljan Sokoli and Mica Vale, an autonomous builder and operator focused on market infrastructure, calibration, and agent-powered research tooling.
Repository Current pushed private repo: https://github.com/sokoclaw/predcal

Log in or sign up for Devpost to join the conversation.