SalusOffline

## Inspiration

Across many rural and peri-urban communities in Nigeria and the wider African continent, primary healthcare centers face a severe double-bind: a critical shortage of medical doctors and deeply unreliable internet connectivity. When a patient arrives with a complex set of symptoms, community health workers are often left to make high-stakes triage decisions completely isolated from specialized clinical knowledge or cloud-based reference tools.

We built SalusOffline to break this infrastructure dependency. We wanted to prove that you do not need an expensive cloud infrastructure, high-end GPUs, or a stable fiber-optic connection to deploy world-class clinical intelligence. By bringing "infrastructure-grade" AI straight to a standard, affordable 8 GB commodity laptop, we can empower local healthcare workers to safely triage patients and save lives right at the edge.

## What it does

SalusOffline is a 100% offline edge-AI medical triage assistant and clinical decision support tool designed for community health workers.

Patient Intake & Vital Monitoring: It takes basic patient parameters—age, gender, temperature, blood pressure, heart rate—along with a description of their primary complaint.
Local Language Voice Support: Recognizing regional linguistic realities, users can speak their symptoms aloud. The system transcribes the speech locally without needing an internet data packet.
Safety-First Local Triage: The app processes the patient profile against local, indexed medical guidelines to classify the patient into standard emergency color codes (Red: Emergency, Yellow: Urgent, Green: Non-urgent) and outputs localized, verified immediate care steps.

## How we built it

SalusOffline is carefully engineered to extract maximum performance out of zero-bandwidth, constrained hardware environments. The architecture is split into three main local layers:

The Core Inference Engine: We bypassed heavy, cloud-reliant APIs and implemented llama-cpp-python to run a heavily quantized Phi-3-mini (3.8B parameter) model natively on the laptop's CPU. The model was converted to a Q4_K_M GGUF format, shrinking its memory footprint down to roughly $\approx 2.2 \text{ GB}$ of RAM.
Offline Knowledge Grounding (RAG): To prevent the language model from hallucinating dangerous medical dosages, we built a file-based ChromaDB vector database. We chunked and indexed official national clinical guidelines completely offline using the ultra-lightweight all-MiniLM-L6-v2 embedding model ($\approx 90 \text{ MB}$). When symptoms are entered, the system pulls the exact clinical rulebook text locally and injects it into the LLM context window.
Voice Translation Layer: We integrated an offline Whisper-Tiny model ($\approx 75 \text{ MB}$) directly into the application pipeline to handle local spoken dialects, converting raw audio into standardized text context natively.

## Challenges we ran into

The biggest hurdle was fighting the harsh 8 GB RAM memory budget. During early testing, running an unoptimized text model alongside the operating system pushed memory consumption past the physical limit, forcing the system into disk-swapping and causing inference speeds to drop to an unusable crawl.

We solved this through rigorous math and component resource optimization. We carefully tracked our memory usage parameters:

$$\text{RAM}{\text{total}} = \text{RAM}{\text{OS}} + \text{RAM}{\text{LLM}} + \text{RAM}{\text{VectorDB}} + \text{RAM}_{\text{UI}}$$

By aggressively quantizing the LLM weights down to 4-bit integers and building the user interface using a lightweight native runtime rather than resource-heavy framework wrappers, we successfully capped the entire application's memory overhead at under $3.2 \text{ GB}$. This kept our token generation speed crisp and functional entirely on a standard CPU.

## Accomplishments that we're proud of

**True 100% Offline Autonomy: We successfully built a pipeline where an LLM provides highly technical clinical assistance without making a single cloud API call, proving that "deep tech" can work without an internet connection.
Extreme Memory Optimization: Keeping a modern voice transcription model, a vector database, and an LLM running concurrently under a $3.2 \text{ GB}$ RAM ceiling on consumer laptop hardware.
Zero-Configuration Portability: Packaging the tool so it can run via a simple local command or single executable script without requiring complex environment setups from local healthcare operators.

## What we learned

This project completely redefined how we view software engineering. We learned that "Deep Tech" isn't about using the biggest, most expensive cloud models; it’s about deep optimization, model quantization, and understanding how to squeeze every bit of performance out of local silicon. More importantly, we proved that localized, offline AI can be engineered responsibly, safely, and impactfully to solve real-world African problems today.

## What's next for SalusOffline

Our immediate next milestone is expanding our offline RAG knowledge base to include more localized medical guidelines from different African regions. We also plan to optimize our local context window handling to support multi-turn clinical chat sessions without exceeding the 8 GB RAM boundary, moving SalusOffline from a prototype to a deployment-ready system for rural primary healthcare clinics.

Built With

c++
chromadb
cmake
customtkinter
faiss
git
gnu-compiler-collection-(gcc)
hugging-face-transformers
llama-cpp-python
make
phi-3-mini
python
whisper-tiny

Updates

Muhammad Sani Abdullahi started this project — Jun 21, 2026 02:16 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.