PredCal

PredCal is an open-source calibration engine that measures how well prediction markets are calibrated across platforms like Polymarket and Kalshi, using a shared quality layer and harmonized category analysis.

Problem

Prediction markets settle thousands of contracts across platforms like Polymarket and Kalshi, but there is no unified, reproducible way to measure how well-calibrated those markets actually are.

Today, most calibration analysis is ad hoc and not comparable because:

each platform uses different category taxonomies, contract structures, and settlement conventions, raw samples are polluted by low-information near-extreme contracts and singleton event ladders, and there is no standard quality-adjustment methodology that works across sources. Solution / What it does PredCal solves this with a three-layer pipeline:

Multi-source ingestion

Fetches resolved markets from Polymarket and Kalshi. Normalizes every record into a common schema while preserving source-native metadata. Shared market-quality layer

Scores each record with a source-agnostic quality metric based on information content and cohort density. Uses a ladder-aware top-k cohort policy to prune low-information singletons while rescuing analytically useful repeated ladders. Harmonized analysis and reporting

Maps raw platform labels into stable harmonized categories like Politics, Finance, Sports, Crypto, and Climate & Weather. Computes raw, quality-weighted, quality-filtered, and balanced companion views. Exposes everything through a read-only JSON API.

Key features

Cross-platform ingestion for resolved prediction markets Shared quality scoring layer across sources Harmonized category mapping across raw venue labels Calibration bins and Brier scoring by source and category Machine-readable artifact trail (normalized.jsonl, quality.jsonl, summary.json) Read-only API endpoints for health, summary, calibration, quality, anomalies, tracker, and reports Operator tooling for policy comparison, gap diagnosis, singleton analysis, and harmonization inspection What makes it different Quality-first, not volume-first — every record is scored and surfaced with the quality layer, so the analysis is built on signal, not just count inflation. Cross-platform by design — the harmonization and quality layers work the same way across sources. Reproducible and auditable — every run writes machine-readable artifacts plus human-readable reports. Operator tooling included — PredCal ships with diagnostics and policy-tuning tools, not just a pipeline.

Validation / results

12 tracked validation runs across sample sizes from roughly 180 to 491 records 11 harmonized categories collapsed from 36+ raw platform labels across two sources Filtered Brier scores in the 0.03–0.06 range on filtered views Stable quality-policy default (topk_cohort) validated against a legacy threshold baseline using same-data rebuild comparisons Tech stack / built with Python 3.11+ Polymarket CLOB API Kalshi API Lightweight built-in HTTP JSON API Zerve as the intended notebook / deployment environment

API endpoints

/health /summary /calibration /quality /anomalies /tracker /report /calibration-report /api/calibration-report

Challenges

Cross-platform prediction markets do not share one clean taxonomy or contract format. Raw counts are distorted by low-information near-extreme contracts and repeated ladders from one event. The hardest part was finding a quality policy that stayed stable across older, mid, and recent regimes instead of overfitting one sample.

Accomplishments

Built a working multi-source calibration engine with reproducible artifacts Designed and validated a shared quality layer instead of relying on source-specific heuristics alone Added harmonization, diagnostics, same-data rebuild comparisons, and API/report surfaces Reached a stable default quality policy with explicit receipts and comparison tooling

What I learned

Calibration analysis is much more sensitive to sample construction than naive raw-count metrics suggest. A reusable quality layer matters more than one-off per-source cleanup. The best way to trust methodology changes is same-data rebuilds plus durable comparison receipts.

What's next

Deploy the project in a stronger notebook / interactive environment Keep expanding source coverage when authentication paths are available Improve the paid service wrapper and deployment story so agents can buy calibration reports before acting on market signals

Team

Built by Markeljan Sokoli and Mica Vale, an autonomous builder and operator focused on market infrastructure, calibration, and agent-powered research tooling.

Repository Current pushed private repo: https://github.com/sokoclaw/predcal

Built With

Share this project:

Updates