BINA — Biomedical Intelligent Network for Assessment 🩺
[!NOTE] Hackathon Prototype: This project was developed exclusively for hackathon purposes to demonstrate the capabilities of multimodal agentic workflows in healthcare.
BINA is a high-performance, multimodal clinical triage assistant designed for professional healthcare environments. It leverages Gemini 2.5 Flash/Pro to synthesize patient notes, diagnostic imaging, and telemetry (Labs/Vitals) into a unified, agentic diagnostic assessment.
🚀 Vision
BINA isn't just a chatbot; it's a clinical reasoning engine. By fusing disparate data types (text, images, and time-series data), BINA provides a holistic view of patient health, validated by integrated physiological scoring tools.
⚠️ The Problem
Modern clinical triage is frequently hampered by data silos and cognitive overload. Clinicians are forced to manually synthesize fragmented information from disparate sources—radiology images, laboratory reports, and real-time bedside vitals—within seconds under high-pressure environments. This fragmentation increases the risk of diagnostic delays, clinical errors, and provider burnout.
🩺 Key Features
- Multimodal Fusion: Processes Patient Notes, Radiology (X-rays), Laboratory Reports, and Vital Sign Trends in a single agentic pass.
- Clinical Data Builders: Advanced interactive tables for real-time manual entry of Lab results and Vital Signs with automatic pre-processing.
- Agentic Verification: Integrated Python-based tools for Sepsis Risk Scoring (qSOFA-inspired) and Vitals Trend Visualization.
- Strict Schema Adherence: Guarantees structured JSON output for integration into EMR systems.
🧠 Diagnostic Schema
BINA adheres to a strict clinical output contract:
-
triage_urgency: RED (Critical) | YELLOW (Urgent) | GREEN (Stable) -
differential_diagnosis: Top 3 competing clinical hypotheses. -
confidence_score: 0.0 to 1.0. -
evidence_summary: Unified reasoning across all input modalities. -
tool_verification_data: Raw scoring and visualization assets from agentic tools.
🛠️ Tech Stack
- LLM Core: Google Gemini 2.5 API (Flash/Pro)
- Backend: Flask (Python 3.10+)
- Frontend: Vanilla JS, Modern CSS (Glassmorphism, Responsive Grid)
- Visualization: Matplotlib (Server-side rendering for clinician review)
- Environment:
python-dotenvfor secure configuration.
🏁 Quick Start
1. Environment Setup
# Clone the repository
git clone <repo-url>
cd bina
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
2. Configure API Key
Create a .env file in the root directory:
GOOGLE_API_KEY=your_gemini_api_key_here
3. Run the Application
python app.py
Visit http://127.0.0.1:5000 to access the dashboard.
🧪 Simulation Mode
For demonstration purposes, BINA includes a Simulation Toggle. When enabled, it populates the system with high-fidelity mock data (Respiratory Failure/Sepsis case) to showcase the tool-use and multimodal capacity without requiring manual entry.
Inspiration
I've spent enough time around hospitals to know what 3 AM in the ER actually looks like. There's this moment—when a patient rolls in and the attending is staring at three different screens, flipping between a blurry chest X-ray, a wall of lab values, and some rushed handwritten notes from EMS—where you can see the cognitive load just crushing them. Every second counts, but the information is scattered everywhere.
That's what stuck with me. Not the fancy AI demos or the research papers, but that exhausted doctor trying to piece together a clinical picture from fragments while the monitor is alarming and the family is asking questions. I kept thinking: we have models that can look at images, read text, analyze trends—why are we still making clinicians do this mental gymnastics manually?
BINA started as a way to fix that specific moment. Not to replace doctors (that's a terrible idea), but to hand them a unified view instantly. One system that actually understands what a chest X-ray is showing AND what those rising lactate levels mean AND how the vital signs have trended over the last hour. Something that thinks through the data the way a sharp resident would, but faster.
What it does
BINA takes everything about a patient—clinical notes, imaging (like X-rays), lab results, vital sign trends—and runs it through a multimodal reasoning pipeline. It's not just reading each piece separately; it's synthesizing them the way a clinician would, building a differential diagnosis by connecting the dots across modalities.
The output is structured and actionable: a triage urgency level (RED/YELLOW/GREEN), the top three diagnostic possibilities, a confidence score, and the reasoning chain that got there. I built in agentic tools that actually verify things—calculating sepsis risk scores, generating vital sign trend graphs, catching patterns humans might miss when they're overloaded.
The interface shows you the AI's thinking in real-time, so you're not just getting a black-box answer. You see why it's flagging something as critical, which pieces of evidence mattered most. For this hackathon demo, I added a simulation mode that loads realistic patient data so judges can see the full workflow without needing to manually input everything.
How we built it
I went with Gemini 2.5 because it genuinely handles multimodal fusion well—I needed something that could look at a chest X-ray and not just caption it, but reason about what infiltrates or effusions mean in the context of lab values and clinical history. The backend is Flask because I needed something I could iterate on fast without fighting a framework.
The frontend was deliberately clinical—I looked at actual EMR interfaces and medical dashboards, borrowed the glassmorphic aesthetic that's everywhere now but kept it professional. Built custom data entry tables for labs and vitals because copying numbers from a lab report is tedious, and clinicians needed an easy way to input data in the format they're used to.
The agentic tools were the interesting part. I implemented Python functions for qSOFA scoring (sepsis assessment) and matplotlib-based vitals visualization, then hooked them into the model's function-calling capability. The model decides when to use them, which means it's not just answering questions—it's actively verifying its own reasoning with quantitative tools.
Schema enforcement was critical. I needed the output to always be structured JSON that could theoretically plug into an EMR system, not freeform text that's useless for automation.
Challenges we ran into
Getting the model to reliably use tools was harder than expected. Early on, it would just... ignore them, even when they were obviously relevant. I had to tune the system prompt heavily to make it understand when tool use was appropriate versus when it should reason directly. There's this balance where you want the model to be agentic without it overthinking simple cases.
Multimodal context management was a mess initially. Gemini lets you pass images and text together, but making sure the model weighted them correctly—not ignoring the X-ray because the text was more detailed, or vice versa—took iteration. I ended up restructuring how I formatted the combined inputs so each modality got appropriate emphasis.
The frontend's real-time "thought trace" display was tricky because the backend processing isn't actually streaming—I had to fake a progressive reveal to make it feel responsive without misleading users about what was happening server-side.
And honestly? The hardest part was resisting the urge to add more features. I had ideas for ECG analysis, medication interaction checking, predictive modeling—but with hackathon time constraints, I had to focus on making the core multimodal triage loop rock-solid instead of building a fragile system that did ten things poorly.
Accomplishments that we're proud of
The thing I'm genuinely proud of is that BINA doesn't feel like a toy. I've seen a lot of "AI healthcare" projects that are basically chatbots with medical keywords sprinkled in. This actually reasons through clinical scenarios in a structured way, uses real scoring systems, and produces output that a clinician could look at and say "okay, this is useful."
Getting the agentic verification working—where the model independently decides to calculate a sepsis score or generate a trend graph—feels like the start of something powerful. It's not just answering queries; it's thinking through problems with tools the way a human would.
The multimodal fusion works better than I expected. Watching it correctly connect a chest X-ray finding with lab values and vitals to land on the right diagnosis, then explain its reasoning clearly, was the moment I knew this was worth building.
And the simulation mode, honestly. It lets anyone demo the full system without needing real patient data or spending ten minutes entering numbers. That's going to matter when judges have two minutes to evaluate this.
What we learned
Medical AI is hard not because the models aren't capable, but because the standards for correctness are brutal. A chatbot can get away with being 80% right; a clinical tool can't. I learned to obsess over edge cases and schema validation in ways I never had to with previous projects.
Agentic systems need tight guardrails. Giving a model tools is powerful, but you have to define when and how to use them precisely, or you get chaos. The difference between "use this tool when relevant" and "use this tool when these specific conditions are met" was huge.
Multimodal models are genuinely game-changing for healthcare because medicine is inherently multimodal. You can't diagnose effectively from just notes or just images or just labs—you need all of it. Building a system that handles that fusion well required rethinking how I structured data and prompts from the ground up.
I also learned that in hackathons, focus beats features every time. I cut half my original scope, and the project is better for it.
What's next for BINA
If I keep working on this, the next step is adding ECG interpretation—cardiac monitoring generates massive amounts of data that's often underutilized because reading rhythms takes time. A multimodal system could flag concerning patterns instantly.
I'd want to implement proper temporal reasoning, where BINA tracks how a patient's condition changes over hours or days, not just analyzing a single snapshot. Real triage involves understanding trajectories, not just states.
Integration with real EMR systems would be huge—pulling data directly from FHIR APIs, pushing assessments back into the clinical workflow. Right now it's a standalone demo; making it genuinely useful means fitting into how hospitals actually work.
Long term? I think there's potential for this architecture to handle pre-hospital triage, where paramedics could get real-time diagnostic support before the patient even arrives. The data is messier, the environment is worse, but the need is just as critical.
But honestly, the biggest next step is validation. This needs to be tested on diverse, realistic cases by actual clinicians. Not to replace their judgment, but to see where it helps and where it falls short. Because if it's not making real doctors' lives easier in real scenarios, then it's just an impressive demo—and medicine deserves better than that.
[!IMPORTANT] Medical Disclaimer: BINA is a proof-of-concept developed for hackathon demonstration purposes only. It is not a substitute for professional medical judgment and is not intended for use in clinical practice or real-world diagnostic scenarios.
Log in or sign up for Devpost to join the conversation.