Onkos
Inspiration
Oncologists spend hours doing work a computer should do. Reading pathology reports, cross-referencing NCCN guidelines, hunting for trials their patient might qualify for. The bottleneck isn't knowledge, it's time. We wanted to build the Perplexity for oncologists: a tool that takes a document, does the full research pipeline on it, and hands back a sourced, auditable answer so the doctor can focus on the patient.
What it does
Drop a pathology PDF. Onkos reads it, extracts every clinically relevant field (staging, Breslow depth, mutation status, biomarkers), walks the NCCN treatment railway phase by phase, surfaces matching clinical trials with a geocoded map of sites near the patient, and produces a downloadable consult report all in under a minute. A HeyGen video avatar powered by Kimi K2 sits in the cockpit and answers follow-up questions in real time with full case context.
The key thing that makes this different: every treatment decision streams its <think> reasoning live. Oncologists don't just see what the model recommends, they see why, step by step. That's what makes it auditable enough to actually use in a clinical setting.
How we built it
Backend: FastAPI + SSE pipeline. When a PDF lands, a deterministic Python orchestrator fires four stages in sequence: field extraction, NCCN railway walk, parallel trial matching + geocoding, lazy PDF report generation. Every stage publishes events to an in-memory bus and the frontend subscribes via SSE and updates live.
LLM layer (dual MBZUAI models):
- Kimi K2 (MBZUAI-IFM/K2-Think-v2) drives all text reasoning: PDF field extraction, NCCN decision-making at each railway node, post-run chat with LangGraph tool calling, and narrative report writing
- MediX-R1-30B (MBZUAI's medical VLM) handles scanned PDFs. Pages get rasterized and MediX reads them visually, the same way a radiologist would
Chat agent: LangGraph state machine where Kimi can call tools in a loop. Pull up a trial, cite PubMed literature, navigate the UI. Real agentic capability post-run.
Frontend: Next.js 15 + Tailwind cockpit. HeyGen avatar on the left as a TTS puppet for Kimi (Kimi generates the answer, HeyGen just speaks it). NCCN railway, trial map, extracted fields, and documents behind URL-synced tabs on the right.
Challenges we ran into
Getting the <think> stream to work cleanly across chunk boundaries was way harder than expected. The <think> and </think> tags split across SSE chunks in unpredictable ways so we built a stateful streaming parser that buffers across chunks and emits typed (thinking | answer, delta) tuples. JSON extraction from VLM outputs was also really brittle. Models paraphrase enum values, hit token limits mid-object, and echo the schema envelope back at you. We ended up building a full repair pipeline with lenient coercion, truncation recovery, and a one-retry loop with a corrective prompt.
Accomplishments that we're proud of
Visible reasoning for high-stakes medicine. The <think> stream isn't a gimmick, it's the entire value proposition. Getting two MBZUAI models (K2 for text, MediX for vision) running through a single unified pipeline with no code changes between them. A chat agent that has the full patient case baked into every turn and degrades gracefully when APIs are unavailable rather than crashing.
What we learned
Deterministic Python beats LLM tool-calling for medical pipelines. Giving a model free rein over a treatment plan introduces failure modes you can't audit. Threading the model through a structured graph and calling it only at specific decision points keeps the output reliable while still surfacing genuine reasoning. Medical VLMs like MediX are genuinely better than general models at reading pathology reports, even from image alone.
What's next for Onkos
Wiring the HeyGen avatar so it can actually drive the UI. Kimi already emits chat_ui_focus tool calls that are supposed to switch tabs and pull up trials, we just need to finish connecting them. Expanding beyond melanoma to the full cancer-agnostic dynamic walker. Capturing ECOG score, prior therapy, and RECIST response in the intake flow to sharpen trial matching precision.
Built With
- chroma
- clinicaltrials.gov-api
- fastapi
- google-maps
- heygen
- kimi-k2
- langgraph
- medix-r1-30b
- next.js-15
- openai-sdk
- pdf2image
- pubmed-api
- pydantic
- python
- reportlab
- sentence-transformers
- sse
- tailwind-css
- typescript
Log in or sign up for Devpost to join the conversation.