Inspiration

A friend traveling from Nairobi to Mumbai for work fell ill and needed emergency care. The Mumbai hospital had no access to his recent lab work showing a liver condition, so they ordered the same expensive tests again. This made us realize: people carry smartphones everywhere, but their medical data is just static files. What if we could use gpt-oss-20b to turn that phone into an intelligent health assistant that works offline?

What it does

HealthID transforms your device into a private medical AI assistant using gpt-oss-20b running locally. It:

  • Stores medical records as verifiable credentials with cryptographic signatures
  • Analyzes lab results using gpt-oss-20b to identify patterns across multiple tests
  • Generates instant summaries for emergency situations
  • Works completely offline - all AI processing happens on your device

How we built it

We combined Ruby on Rails for the credential system with gpt-oss-20b for local intelligence:

  1. Backend: Rails 7 API with PostgreSQL for verifiable credential management
  2. Local AI: Python wrapper calling gpt-oss-20b through Ollama's API
  3. Integration: HTTP calls from Rails to local Ollama instance (localhost:11434)
  4. Model serving: Ollama running gpt-oss-20b with 4-bit quantization The architecture keeps medical data local while still providing intelligent analysis through the on-device model.

Challenges we ran into

  1. MacBook Air M1 with 8GB RAM couldn't handle gpt-oss-20b: The model needed 13GB+ RAM. We had to use aggressive quantization (Q4_0) and still faced constant memory pressure. Eventually switched to testing on a borrowed 16GB machine.

    1. Ollama kept crashing: When processing longer medical documents, Ollama would terminate unexpectedly. Fixed by limiting context window to 2048 tokens and chunking documents.
    2. Response times were terrible: Initial inference took 30-45 seconds on the MacBook Air. Even with optimization, couldn't get below 15 seconds for complex medical queries.
    3. Model hallucinated medical values: gpt-oss would sometimes invent lab values that weren't in the input. Had to implement strict output parsing to catch and filter these.
    4. Rails-to-Python integration issues: Subprocess calls from Rails to Python script were unreliable. Switched to running Ollama as a service and using HTTP API calls instead.

Accomplishments that we're proud of

  • Got gpt-oss-20b running locally: Successfully integrated Ollama with Rails, despite the memory constraints
  • Implemented verifiable credentials: Built a working cryptographic signature system for medical records. Paused for the purpose of this hackerthon.
  • Created structured medical prompts: Developed a prompt format that reduces hallucinations in medical contexts
  • Built a working prototype: The system can actually analyze a lab report and generate insights (even if slowly)
  • Achieved offline operation: Entire stack runs without internet connection

What we learned

  • gpt-oss-20b is HEAVY: Need at least 16GB RAM for decent performance, 32GB ideal
  • Quantization has limits: Q4_0 quantization degrades medical reasoning quality significantly
  • Ollama simplifies deployment: Much easier than trying to run raw model files
  • Medical prompts need strict formatting: Free-form prompts lead to dangerous hallucinations
  • MacBook Air isn't for LLM development: 8GB RAM is simply not enough for 20B parameter models

What's next for HealthID

  • Try gpt-oss-7b: Smaller model that could actually run on normal laptops
  • Implement proper chunking: Handle large medical documents without crashes
  • Add response caching: Store common medical queries to avoid re-inference
  • Fine-tune on medical data: If we can get access to medical datasets
  • Build proper API layer: Replace subprocess calls with proper message queue
Share this project:

Updates