FHIR NLP System

Inspiration

Healthcare data is complex and often siloed in different systems. We wanted to make it accessible to everyone—not just data engineers. By combining natural language processing with AI, we envisioned a system where anyone could ask questions about patient data in plain English and get instant insights, without needing to write SQL or understand complex FHIR schemas.

What it does

Healthcare Interoperability Connector is an AI-powered natural language query engine for healthcare data:

Ask in English: Users type questions like "Show me all patients with diabetes" or "What are the most common conditions?"
AI Generates SQL: Gemini automatically converts natural language to optimized BigQuery SQL queries
Instant Results: Queries execute against FHIR Synthea public dataset with 2-5 second response times
Smart Visualization: Results display in interactive tables and charts with intelligent formatting
FHIR Compliant: Works with standard FHIR R4 healthcare data structures

How we built it

Architecture:

Backend: Python Flask API with Vertex AI (Gemini) for SQL generation
Frontend: React with Material-UI for modern, responsive interface
Data: Google BigQuery with public FHIR Synthea dataset (17 FHIR resource types)
Integration: Fivetran for data pipeline orchestration
Deployment: Google Cloud Run for serverless scalability

Key Components:

NLQueryService - Converts natural language to SQL using Gemini
QueryResultsVisualization - Smart formatting of complex FHIR nested objects
React Dashboard - Clean, intuitive UI for healthcare professionals
BigQuery Integration - Direct access to FHIR data at scale

Challenges we ran into

FHIR Complexity: FHIR R4 has deeply nested structures. We had to teach Gemini the exact schema paths (e.g., code.coding[0].display, value.quantity.value)
Object Formatting: React was displaying [object Object] for nested FHIR records. We built a smart formatter that extracts readable values from complex structures
Schema Context: Gemini needed comprehensive schema information to generate correct queries. We created detailed prompts with FHIR-specific examples
Data Source Mismatch: Initial queries pointed to non-existent local tables. We pivoted to use the public BigQuery FHIR Synthea dataset
Query Accuracy: Some generated queries had incorrect field names. We improved the prompt with explicit FHIR field documentation

Accomplishments that we're proud of

✅ End-to-End MVP: Fully functional system from natural language input to visualized results

✅ Smart Data Formatting: Intelligent extraction of readable values from complex FHIR nested objects with hover tooltips

✅ Production-Ready Code: Clean architecture, proper error handling, structured logging

✅ Cloud Deployment: Successfully deployed to Google Cloud Run with public endpoint

✅ Comprehensive Documentation: Single README covering architecture, setup, usage, and troubleshooting

✅ Real Data: Works with actual FHIR Synthea dataset (1M+ patient records)

✅ Fast Performance: Queries execute in 2-5 seconds on average

✅ User-Friendly UI: Clean React interface with example queries and result visualization

What we learned

FHIR is Powerful but Complex: Understanding nested FHIR structures was crucial for generating correct queries
AI Needs Context: Gemini performs much better with detailed schema documentation and examples
User Experience Matters: Smart formatting of results is as important as generating correct queries
Prompt Engineering is Key: Small changes to AI prompts significantly impact query quality
BigQuery is Scalable: Public datasets enable rapid prototyping without data setup overhead
React + MUI = Productivity: Material-UI components accelerated frontend development
Cloud-Native Architecture: Serverless deployment (Cloud Run) simplified DevOps