Inspiration

We've both seen family members come home from a hospital visit and stare at a bill that made no sense. Line items with codes like 99213 or 93000, totals that didn't add up, and no one to call who would actually explain it. You either pay it or spend hours on hold with insurance and most people just pay it.

When we found out that roughly 80% of medical bills contain errors, and that billing mistakes cost Americans over \$210 billion annually, it stopped feeling like a personal frustration and started feeling like a systems problem worth solving. OverSight came from a simple question: what if a patient had the same information their hospital's billing department does?

What It Does

OverSight is an AI-powered medical bill analyzer that turns an unreadable hospital bill into a clear, actionable breakdown in under 60 seconds.

Upload a medical bill → Stella, our AI voice advocate, walks you through it → OverSight benchmarks every charge against national CMS data → you get a savings estimate, a risk breakdown, and a dispute letter ready to send.

Specifically, OverSight detects:

  • Upcoding - procedures billed at a higher complexity than performed
  • Duplicate charges - the same service billed more than once
  • Out-of-network misclassification - in-network providers billed as out-of-network
  • Above-benchmark pricing - charges that exceed regional norms for the same CPT code

How We Built It

We split the system into three layers that needed to talk to each other in real time.

Compute layer - Modal handles all heavy lifting. When a bill is uploaded, a serverless container spins up, runs OCR, extracts every charge and CPT code, and benchmarks each line item against CMS data. Anomaly detection runs here. Modal was central enough to our architecture that we're submitting for the AI Inspection sponsor track.

Memory layer - Every analysis is committed to SuperMemory. This gives Stella full context about the current bill during the session, and persists across sessions if a user comes back with a new bill, she already knows their history.

Interface layer - Next.js 15 frontend with ElevenLabs powering Stella's voice. OpenAI generates the final dispute letter once analysis is complete. Cloudflare handles edge security and encryption for HIPAA alignment. Clerk manages authentication.

Challenges We Ran Into

PDF parsing is genuinely hard. There's no standard format for medical bills some are clean digital exports, others are scanned faxes from decades ago. We went through multiple approaches before landing on a pipeline robust enough to handle structured, unstructured, and image-based documents reliably.

Synchronizing voice and analysis. Stella needs to talk to the user while analysis runs in the background, but she also needs the results before she can explain them. We built a state handoff between ElevenLabs and Modal that buffers context and releases it to the voice layer the moment analysis resolves without leaving the user in silence.

Defining "overcharge" rigorously. Pricing varies by region, provider type, and insurance network. Flat thresholds didn't hold up. We ended up building per-code statistical models from CMS distribution data, which made flagging significantly more accurate and defensible.

HIPAA-conscious design under time pressure. Handling real medical data at a hackathon meant we couldn't cut corners on security. Cloudflare encryption, stateless Modal compute, and zero persistent voice storage weren't afterthoughts they shaped architecture decisions from the start.

Accomplishments That We're Proud Of

Honestly, the thing we're most proud of is that it actually works on real bills. Not synthetic test data actual messy, inconsistent, real-world medical PDFs.

We're also proud of how the voice layer came together. Stella doesn't feel like a chatbot bolted onto an analysis tool. She has context, she remembers, and she explains things the way a person would. Getting that to feel natural within a hackathon timeline was harder than we expected.

And building a statistically grounded anomaly detection model rather than just flagging anything above a round number felt like the right call for a product that's making financial claims about someone's healthcare.

What We Learned

Healthcare billing has rules, and the rules are deliberately opaque. Learning how CPT codes, ICD codes, and CMS reimbursement schedules actually interact gave us a lot more respect for why this problem is still unsolved at scale.

On the technical side: Modal's serverless model is genuinely well-suited for workloads like this, where compute is bursty and unpredictable. SuperMemory's approach to volatile context storage was something we'd never used before and ended up being one of our cleanest integrations.

We also learned that the UX of something this sensitive matters enormously. People are stressed when they look at a medical bill. Every design decision how Stella speaks, how findings are framed, how the dispute letter reads affects whether someone actually uses the output or closes the tab.

What's Next for OverSight

Insurance API integrations - pulling EOB data directly instead of relying on PDF uploads would significantly improve accuracy and reduce friction.

Automated dispute submission - right now we generate the letter. The next step is sending it. Longitudinal cost tracking- letting users track their medical spending over time, across providers, and see how their costs compare to regional benchmarks.

Provider transparency scores - rating hospitals and billing departments based on error rates in the bills we've seen, surfaced anonymously and in aggregate.

Employer and hospital partnerships - OverSight working at the organizational level, helping self-insured employers audit claims before they're paid out.

The core insight that got us here that patients are at a structural information disadvantage doesn't go away after a hackathon. We think there's a real product here, and we want to keep building it.

Built With

Share this project:

Updates