Inspiration

Legislation moves markets, but the signal is fragmented: the “what” lives in dense 10‑K/10‑Q/8‑K risk factors while the “will it happen?” changes daily. RegAlpha bridges that gap by using Polymarket’s market‑implied odds as a live uncertainty signal and grounding the impact in traceable SEC filing evidence—so you can see both the probability of a law passing and exactly why it matters for a specific company.

What it does

RegAlpha helps you pick a ticker and instantly see:

  • Real-time legislative risk for that company, adjusted by Polymarket pass probabilities (a live signal of “how likely this becomes real”)
  • The exact SEC filing excerpts that triggered the score (the “why”)
  • An interactive knowledge graph linking companies, sectors, suppliers, and laws, surfacing indirect exposure (e.g., sector-level spillover and supplier-chain risk)
  • Custom legislation “what-if” mode: paste a proposed bill/regulation (or draft language) and instantly see its company-specific impact, with traceable SEC filing excerpts. (We treat custom laws as worst-case p=1 of passing since there’s no Polymarket market.)

How we built it (deeper)

  • Polymarket (live uncertainty signal):
    • We pull active prediction markets from Polymarket’s Gamma API and compute the implied pass probability from the “Yes” outcome price.
    • That probability becomes the uncertainty layer in the final risk: if the market reprices the likelihood of passage, the expected risk updates automatically (instead of treating every bill as equally likely).
  • Gemini (market → legislation enrichment):
    • Polymarket markets often start as short questions (“Will X pass?”). We use Gemini 2.5 to transform that sparse prompt into structured legislative metadata: a likely bill identifier, title, a multi-paragraph “bill text / key provisions,” a concise summary, and affected sectors.
    • This enrichment makes downstream retrieval possible: instead of embedding only a one-line question, we embed substantive legislative language that can actually match filing risk factors.
  • Snowflake Cortex (RAG + vector engine):
    • We chunk SEC filings into semantic passages and store them in Snowflake with embeddings generated by Cortex EMBED_TEXT_768, enabling fast similarity search at scale.
    • We also chunk enriched legislation and store it in an ACTIVE_LAWS table with the same embedding dimension, so we can retrieve laws similar to a company’s filings and filing passages similar to a given law.
    • At query time, we run vector similarity search in Snowflake to retrieve top matches (and their similarity scores) without recomputing embeddings in the app.
  • Risk scoring:

    • We compute exposure from matched filing chunks using similarity plus weighting (section importance, recency decay, and chunk size).
    • Then we incorporate Polymarket probability to produce an “expected” risk (market-weighted) vs a “worst-case” scenario (assuming passage).
  • Neo4j knowledge graph (context + indirect risk):

    • We populate Neo4j with nodes for Company/Sector/Supplier/Law and relationships like AFFECTS (Law→Sector) and company context edges.
    • This adds interpretability and lets us reason about second-order effects (e.g., a law affecting a sector that a supplier depends on).
  • Frontend:

    • React + TypeScript with a risk dashboard, law cards showing pass probability, and expandable matched-filing evidence.
    • The knowledge graph is rendered interactively using Cytoscrape.js so users can “see” exposure paths instead of reading a wall of text.

Challenges we ran into

  • Markets aren’t bills: Polymarket questions don’t always map cleanly to an official bill ID. We solved this by using Gemini structured outputs with fallbacks and focusing on generating realistic legislative language for retrieval.
  • Scaling to the S&P 500 under hackathon time: Ingesting and embedding the sheer volume of 10‑K/10‑Q/8‑K filings is huge, so for the demo we constrained coverage to a small subset (top ~10 S&P 500 companies) to keep ingestion, vectorization, and iteration feasible.
  • Indirect/related-entity scoring: We wanted to score second-order exposure (e.g., suppliers/peers) more comprehensively, but accurately modeling and validating those relationships end-to-end within the time window was challenging—so we focused on getting direct company↔legislation risk and explainability solid first.

Accomplishments we’re proud of

  • Real-time sentiment in the scoring loop via Polymarket probabilities (expected vs worst-case framing).
  • End-to-end explainability: every risk score is backed by the specific SEC filing excerpts that drove it.
  • Knowledge-graph enrichment: the graph turns “risk score” into “risk story,” revealing indirect exposure paths.

What we learned

  • Prediction markets are an effective way to represent legislative uncertainty as a numeric signal you can compute with.
  • Cortex-style vector search becomes far more persuasive when paired with provenance (top matches) and a visual knowledge graph.

What’s next

  • Actionable hedging strategies: generate portfolio recommendations (e.g., sector ETF hedges, options overlays, position sizing) tied directly to each law’s probability and the filing-backed exposure drivers.
  • Stronger entity linking + propagation for indirect risk (supplier→sector→law), with better validation.
  • Backtesting: compare historical policy outcomes vs. Polymarket odds and post-event stock moves.

Built With

  • cytoscape.js
  • fastapi
  • google-gemini
  • neo4j-aura
  • react
  • snowflake-cortex
  • tailwind-css
  • tanstack-query
  • typescript
Share this project:

Updates