Inspiration

Colorectal cancer outcomes improve dramatically with early detection, but expert review of histopathology imagery is time‑consuming and scarce. We built Healer to make preliminary, accessible medical guidance available anywhere: a simple chat interface that can reason over text and medical images to support clinicians and patients with fast, explainable answers.

What it does

  • Chat-based medical assistant that handles both text questions and image‑based queries (e.g., histopathology slides).
  • Optional speech‑to‑text input for hands‑free use in clinical settings.
  • Returns concise, clinically framed responses with reasoning and next‑step recommendations.
  • Persists chat history locally for quick context recall (no data leaves the browser beyond a single inference request).

How we built it

  • Frontend: React + TypeScript + Vite, shadcn/ui components, Web Speech API, and localStorage for chat history.
  • Backend: FastAPI + Uvicorn hosting a MedGemma 4B (instruction‑tuned) pipeline using Hugging Face Transformers and PyTorch.
  • Model handling: We download MedGemma during the Docker build (download.py) and ship weights inside the image for zero external dependencies at runtime.
  • Inference: AutoProcessor + AutoModelForImageTextToText; device‑aware execution (CUDA if available, otherwise CPU) with bfloat16/float32 automatically chosen.
  • Cloud: Built with Google Cloud Build and deployed to Cloud Run as a single container (API + model in the same service).

Challenges we ran into

  • Memory and latency: Large VLMs are heavy. We optimized dtype and device usage and tuned token generation to keep response times reasonable. For production, we keep one warm instance to mitigate cold starts.
  • Build time with big weights: Model download initially made builds painfully slow. Moving it to a dedicated Docker layer with caching fixed iteration speed.
  • Secret handling: Passing Hugging Face tokens securely through Cloud Build while keeping them out of image layers required careful environment/arg plumbing.
  • Prompt reliability: Getting definitive, clinically useful answers consistently required several prompt iterations and clear structure for text‑only vs. image+text.

Accomplishments that we're proud of

  • A fully containerized, production‑deployable medical VLM service with a clean UI.
  • Multimodal pipeline (text + image) running behind a simple REST endpoint.
  • Hands‑free speech capture and a smooth chat experience with local persistence.
  • Straightforward cloud deployment (Cloud Build → Cloud Run) that anyone can reproduce.

What we learned

  • Practical MLOps for multimodal models: layer caching, image size vs. boot time trade‑offs, and device/dtype tuning.
  • Prompt engineering matters—clinically framed instructions dramatically improve answer quality and consistency.
  • Cloud Run is viable for VLMs when you bake weights into the image and manage cold starts thoughtfully; GPUs can be added when needed.

What's next for Healer

  • Streaming responses and partial rendering for faster perceived latency.
  • Formal evaluation on public medical benchmarks; add guardrails and error detection.
  • Optional GPU deployment path and autoscaling policies; explore Vertex AI Model Garden variants.
  • Privacy & compliance hardening (audit logging, PHI handling guidance, and enterprise controls).
  • Richer multimodal inputs (DICOM, dermatoscopic images) and structured outputs (clinical note templates, billing codes suggestions).
  • User features: session sharing, export to PDF, and curated “second‑opinion” mode that lists differentials and test plans.

Built With

Share this project:

Updates