Devpost Project Story

Inspiration

I'm a doctor. Every day I hand patients documents they cannot interpret - cluttered discharge summaries dense with abbreviations, overwhelming pathology reports full of values they have no reference for, referral letters written clinician-to-clinician. The information asymmetry between healthcare providers and patients is enormous. Patients own their health data in theory. In practice, it's siloed across providers, buried in envelopes, filing cabinets, and portals they never check.

I wanted to build something that closes that gap. Not a chatbot that summarises a PDF. A system that actually reads medical documents the way a clinician would - finding every encounter, every condition, every medication, every vital sign - and structures it into a real, unified health record. One place where everything converges, and the patient finally has the full picture.

What it does

Exora is a universal health data API. Upload any medical document - referral letters, pathology reports, discharge summaries, prescriptions - in any format. The system outputs structured, unified, queryable health data. No templates, no pre-configuration, no prior knowledge of the document format required. Stress-tested on 250+ page documents.

A four-stage Gemini pipeline processes every document:

  1. Encounter Discovery - identifies healthcare encounters (visits, admissions, consultations) within documents
  2. Entity Detection - detects clinical entities across complex medical language: conditions, medications, vitals, allergies, immunizations, procedures, lab results
  3. Clinical Extraction - routes entities through specialized prompt networks -each scoped to its clinical domain -producing 24 normalized data categories
  4. Data Formalization - assigns standardized medical codes (SNOMED CT, RxNorm, LOINC) and applies proprietary grouping algorithms to unify equivalent entities across documents

Every stage is powered by Gemini. Beyond the pipeline, Gemini powers an in-app AI chat that lets users query their health records in plain language -grounded entirely in extracted data, not general medical knowledge.

The result is not a vector store or a summary. It is a real clinical data model. Every extracted data point carries a quality badge based on source authority. Every entity maps back to its source document, page, and line via Google Cloud Vision OCR. Multiple documents from different providers and dates converge into one unified health record.

How I built it

The processing pipeline runs on Google Cloud Run (Sydney region). Each document passes through all four Gemini stages sequentially, with each stage building on the output of the previous one. Google Cloud Vision API handles OCR at the input, providing the text and spatial coordinates that enable source-level traceability.

The extracted data lands in a Supabase PostgreSQL database with 24 clinical spoke tables - conditions, medications, vitals, lab results, procedures, allergies, immunizations, and more. A four-tier data quality system scores every data point based on source authority, patient identification, and provider credentials - from clinician-confirmed down through high, medium, and low.

The frontend is built with Expo (React Native), delivering iOS, Android, and web from a single codebase. The app includes a healthcare timeline, entity browsers with quality badges, trend charts for vitals, and the Gemini-powered AI chat.

Solo developer. One codebase. Built in Australia.

Challenges I faced

Document chaos is the real problem. Medical documents have no standard format. A GP referral looks nothing like a hospital discharge summary, which looks nothing like a pathology report. The pipeline had to handle all of them without templates or pre-configuration - the AI has to figure out the structure of each document from scratch.

Not RAG. The temptation with health data + AI is to throw everything into a vector store and do retrieval-augmented generation. That produces summaries, not structured data. I needed a real clinical data model - normalized tables, standardized codes, quality tracking - because that is what makes the data genuinely useful beyond a single conversation.

Traceability. Every extracted entity needs to trace back to its source. Patients and clinicians need to verify what the AI found. Google Cloud Vision's spatial OCR coordinates made this possible - the app highlights exactly where on the original document each data point was extracted from.

What I learned

Building for healthcare is fundamentally about trust. The data quality system, the source traceability, the "this is not medical advice" framing - none of that is optional. Patients will only use a tool like this if they can verify what it tells them. The AI has to show its work.

What's next

Exora is pre-launch. The pipeline is operational, the app is functional across three platforms, and the clinical data model is proven. Next steps are provider-side contributions (clinicians confirming or correcting extracted data), My Health Record integration (Australia's national health records system), and opening the API for third-party health applications to build on top of the structured data.


Built with: Gemini API, Google Cloud Vision API, Google Cloud Run, Supabase, Expo, React Native, TypeScript, PostgreSQL, Vercel

Try it out:

Built With

Share this project:

Updates