Inspiration

ACTG 175 Treatment Policy Storyboard

The Story Behind the Data

In 1991, while the world watched the AIDS epidemic claim over 100,000 lives annually in the United States alone, a group of researchers asked a simple question that would change medicine forever: What if we combined drugs instead of using them alone?

This project brings that pivotal moment in medical history to life through interactive data storytelling.

Inspiration

The ACTG 175 dataset doesn't look like much at first glance — patient IDs, CD4 counts, treatment codes. But then I calculated the outcomes:

ZDV monotherapy: CD4 cells declined by 17. Event rate: 34%.

ZDV + ddI combination: CD4 cells increased by 54. Event rate: 20%.

Patients on the "standard of care" were getting worse. The combination therapy patients were thriving. This was not just data, this was the moment combination antiretroviral therapy was proven, the discovery that would eventually save millions of lives worldwide.

What We Learned

The Science

CD4/CD8 ratios below 1.0 indicate immune suppression — 98% of trial patients had abnormal ratios
Synergy is real: ZDV+ddI didn't just add benefits, it multiplied them (actual effect exceeded predicted additive effect by +50 cells)
Equity matters: IV drug users actually had better outcomes (19% vs 25%), shattering stigma when given proper care

The Tech

Gemini 2.0 Flash can generate medically-accurate, contextually-aware narration in real-time
ElevenLabs voice synthesis makes data accessible to visually impaired users
Plotly's 3D surfaces reveal patterns invisible in 2D — the "death valley" where low CD4 meets high age

The Craft

Data storytelling isn't about showing all the data — it's about showing the right data at the right moment
AI narration works best when grounded in specific statistics, not vague summaries
Interactive visualizations should invite exploration, not overwhelm

How We Built It

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Streamlit Frontend                    │
├─────────────┬─────────────┬─────────────┬───────────────┤
│  9 Chapters │ 13 Viz Pages│  RAG Chat   │ Risk Calc     │
├─────────────┴─────────────┴─────────────┴───────────────┤
│                   Feature Engineering                    │
│            (49 derived features from 23 raw)            │
├─────────────────────────────────────────────────────────┤
│  Gemini 2.0 Flash  │  ElevenLabs TTS  │  Plotly/MPL    │
└─────────────────────────────────────────────────────────┘

Feature Engineering

From 23 raw clinical variables, I engineered 49 features:

$$\text{CD4 Change} = \text{CD4}{20wk} - \text{CD4}{baseline}$$

$$\text{Immune Ratio} = \frac{\text{CD4}}{\text{CD8}}$$

$$\text{Risk Score} = \sum_{i} w_i \cdot \mathbb{1}[\text{risk factor}_i]$$

Response categories were computed as:

Super Responder: $\Delta \text{CD4} > 150$
Improved: $0 < \Delta \text{CD4} \leq 150$
Stable: $-50 < \Delta \text{CD4} \leq 0$
Declined: $\Delta \text{CD4} \leq -50$

Visualization Philosophy

Each visualization serves a narrative purpose:

Chapter	Visualization	Story Beat
Prologue	Animated counter	Scale of the crisis
Demographics	Parallel coordinates	Every line is a life
Results	Survival curves	Watch the treatments diverge
Deep Dive	3D surface	Find the "death valley"
Equity	Dumbbell chart	Measure the gaps
Winners	Radar chart	Crown the champion

AI Integration

Gemini prompts are grounded in computed statistics:

prompt = f"""
You are narrating the ACTG 175 trial results.
ZDV+ddI: {zdv_ddi_rate:.1f}% event rate, +{zdv_ddi_cd4:.0f} CD4
ZDV only: {zdv_rate:.1f}% event rate, {zdv_cd4:.0f} CD4
Explain what this means for patients in 2-3 sentences.
"""

This prevents hallucination while enabling natural, contextual narration.

Challenges I Faced

Challenge 1: Making Statistics Emotional

Raw numbers don't move people. "34% event rate" means nothing until you realize that's 1 in 3 patients developing AIDS or dying. I solved this by:

Leading with human context ("100,000 lives lost annually")
Using the strip plot where every dot is a patient
Adding the quote: "The lives saved by this study number in the millions"

Challenge 2: AI Narration Consistency

Early Gemini outputs were inconsistent — sometimes too technical, sometimes too casual. I solved this by:

Providing explicit statistics in every prompt
Using system prompts that establish voice and expertise level
Caching responses to ensure chapter-to-chapter consistency

Challenge 3: The 3D Visualization Performance

Rendering a 3D survival surface with 2,139 patients was slow. I solved this by:

Binning into a 15×15 grid (age × CD4)
Pre-computing survival rates per cell
Using WebGL rendering via Plotly

Challenge 4: Voice Synthesis Latency

ElevenLabs API calls took 3-5 seconds, breaking narrative flow. I solved this by:

Making voice optional (text always visible)
Adding loading spinners with "Generating voice..." feedback
Caching audio in session state

Whats Next

Knowledge Graph: Neo4j integration to explore treatment-outcome-demographic relationships
Cohort Builder: Let users define custom patient subgroups and compare outcomes
Policy Simulator: "What if we had started combination therapy 2 years earlier?"
Multi-language: Translate narration for global health education

The Impact

ACTG 175 wasn't just a clinical trial. It was the turning point that transformed HIV from a death sentence into a manageable chronic condition. Today, 29 million people are on antiretroviral therapy worldwide.

This project exists because I believe data has stories to tell — stories that deserve to be heard, understood, and remembered.