Disease Tracker — Hackathon Submission

Inspiration

The hantavirus outbreak made headlines in 2024 — but people on Reddit were talking about it days before any official advisory reached the public. That gap stuck with us. Official disease surveillance systems are slow by design: they wait for lab confirmation, clinical reporting, and bureaucratic sign-off before telling communities anything. By then, people have already been exposed, supply shelves are already cleared out, and the window for early action has closed.

We wanted to close that gap. Social media — especially Reddit — is a real-time sensor for what is actually happening in communities. When someone in r/Arizona posts about a neighbor with unusual symptoms, or r/California fills up with food poisoning reports from the same restaurant, that signal exists hours or days before any health department knows about it. The question is whether anyone is listening at scale.

Disease Tracker is our answer: a fully automated pipeline that listens to 105 subreddits simultaneously, scores every post with AI, and tells you not just that a threat exists — but exactly which supplies to get and where to get them right now.


What It Does

Disease Tracker monitors Reddit in real time across all 50 US states, DC, and 50+ countries. Every 60 seconds it fetches up to 10,500 Reddit posts, filters for 29 disease keywords, and stores matches in a ClickHouse database. Every 10 seconds, Gemini 2.5 Flash scores the unscored posts from 0 to 10 on likelihood of a genuine public health threat.

When a post scores 7 or above, the system treats it as a real signal. It automatically searches CVS, Walgreens, Walmart, and Rite Aid — in parallel — for the relevant medical supplies based on the disease type detected. A hantavirus signal triggers N95 P100 respirator and rodent repellent searches. A flu outbreak triggers Tamiflu and Theraflu. Salmonella triggers Imodium and Pedialyte. Gemini then reads all four retailers' results and generates a recommendation comparing availability.

The live dashboard shows a choropleth world map with regions colored by risk score. Clicking any country or US state opens a sidebar with ranked alerts. Each alert card expands to show the Reddit post body, a link to the original source, the Gemini supply recommendation, and a side-by-side comparison of what each retailer has in stock. High-score alerts are also published as structured, citable advisories to cited.md via Senso — making the intelligence agent-discoverable and permanently referenceable.


How We Built It

Polling layer: feedparser hits each subreddit's RSS feed (/r/{name}/new/.rss?limit=100) every 60 seconds. Matched posts are inserted into ClickHouse using ReplacingMergeTree for deduplication — we check existing links before each insert to prevent race conditions where a re-polled post would overwrite a scored record with null.

Scoring layer: A background asyncio loop calls asyncio.to_thread every 10 seconds to run Gemini batch scoring off the main thread. We batch up to 20 posts per Gemini API call and parse the JSON array response. A second Gemini call geolocates posts from global subreddits using a constrained list of 63 country names that match the GeoJSON ADMIN field exactly — this prevents hallucinated country names from breaking the map layer.

Retail scraping layer: Nimble CLI runs via subprocess inside a ThreadPoolExecutor — all four retailer searches for a given alert fire in parallel. Results are stored raw in ClickHouse as JSON, then a final Gemini call reads all four retailers' results and writes a 2-3 sentence recommendation to a separate retail_recommendations table.

Frontend: A single-page app with Leaflet.js and CartoDB dark tiles. Two GeoJSON layers (world countries + US states) are styled dynamically from the API. Clicking a region fires three parallel fetches (Promise.allSettled) for alerts, retail supplies, and Gemini recommendations. Everything renders inside the alert card — no separate panels, no duplication.

Knowledge layer: Senso indexes the project's knowledge base — system architecture, competitive analysis, case studies, FAQ — and publishes structured outbreak advisories to cited.md when high-score events are detected. The advisories include the Gemini risk score, location, disease keywords, supply availability comparison across retailers, and a source citation back to the original Reddit post. This closes the loop: social signal → scored alert → retail intelligence → published, citable public health advisory.

Stack: Python + FastAPI, ClickHouse Cloud, Google Gemini 2.5 Flash, Nimble web intelligence, Senso, Leaflet.js, feedparser, python-dotenv, uv.

Built With

Share this project:

Updates