Inspiration
ACTG 175 Treatment Policy Storyboard
The Story Behind the Data
In 1991, while the world watched the AIDS epidemic claim over 100,000 lives annually in the United States alone, a group of researchers asked a simple question that would change medicine forever: What if we combined drugs instead of using them alone?
This project brings that pivotal moment in medical history to life through interactive data storytelling.
Inspiration
The ACTG 175 dataset doesn't look like much at first glance — patient IDs, CD4 counts, treatment codes. But then I calculated the outcomes:
ZDV monotherapy: CD4 cells declined by 17. Event rate: 34%.
ZDV + ddI combination: CD4 cells increased by 54. Event rate: 20%.
Patients on the "standard of care" were getting worse. The combination therapy patients were thriving. This was not just data, this was the moment combination antiretroviral therapy was proven, the discovery that would eventually save millions of lives worldwide.
What We Learned
The Science
- CD4/CD8 ratios below 1.0 indicate immune suppression — 98% of trial patients had abnormal ratios
- Synergy is real: ZDV+ddI didn't just add benefits, it multiplied them (actual effect exceeded predicted additive effect by +50 cells)
- Equity matters: IV drug users actually had better outcomes (19% vs 25%), shattering stigma when given proper care
The Tech
- Gemini 2.0 Flash can generate medically-accurate, contextually-aware narration in real-time
- ElevenLabs voice synthesis makes data accessible to visually impaired users
- Plotly's 3D surfaces reveal patterns invisible in 2D — the "death valley" where low CD4 meets high age
The Craft
- Data storytelling isn't about showing all the data — it's about showing the right data at the right moment
- AI narration works best when grounded in specific statistics, not vague summaries
- Interactive visualizations should invite exploration, not overwhelm
How We Built It
Architecture
┌─────────────────────────────────────────────────────────┐
│ Streamlit Frontend │
├─────────────┬─────────────┬─────────────┬───────────────┤
│ 9 Chapters │ 13 Viz Pages│ RAG Chat │ Risk Calc │
├─────────────┴─────────────┴─────────────┴───────────────┤
│ Feature Engineering │
│ (49 derived features from 23 raw) │
├─────────────────────────────────────────────────────────┤
│ Gemini 2.0 Flash │ ElevenLabs TTS │ Plotly/MPL │
└─────────────────────────────────────────────────────────┘
Feature Engineering
From 23 raw clinical variables, I engineered 49 features:
$$\text{CD4 Change} = \text{CD4}{20wk} - \text{CD4}{baseline}$$
$$\text{Immune Ratio} = \frac{\text{CD4}}{\text{CD8}}$$
$$\text{Risk Score} = \sum_{i} w_i \cdot \mathbb{1}[\text{risk factor}_i]$$
Response categories were computed as:
- Super Responder: $\Delta \text{CD4} > 150$
- Improved: $0 < \Delta \text{CD4} \leq 150$
- Stable: $-50 < \Delta \text{CD4} \leq 0$
- Declined: $\Delta \text{CD4} \leq -50$
Visualization Philosophy
Each visualization serves a narrative purpose:
| Chapter | Visualization | Story Beat |
|---|---|---|
| Prologue | Animated counter | Scale of the crisis |
| Demographics | Parallel coordinates | Every line is a life |
| Results | Survival curves | Watch the treatments diverge |
| Deep Dive | 3D surface | Find the "death valley" |
| Equity | Dumbbell chart | Measure the gaps |
| Winners | Radar chart | Crown the champion |
AI Integration
Gemini prompts are grounded in computed statistics:
prompt = f"""
You are narrating the ACTG 175 trial results.
ZDV+ddI: {zdv_ddi_rate:.1f}% event rate, +{zdv_ddi_cd4:.0f} CD4
ZDV only: {zdv_rate:.1f}% event rate, {zdv_cd4:.0f} CD4
Explain what this means for patients in 2-3 sentences.
"""
This prevents hallucination while enabling natural, contextual narration.
Challenges I Faced
Challenge 1: Making Statistics Emotional
Raw numbers don't move people. "34% event rate" means nothing until you realize that's 1 in 3 patients developing AIDS or dying. I solved this by:
- Leading with human context ("100,000 lives lost annually")
- Using the strip plot where every dot is a patient
- Adding the quote: "The lives saved by this study number in the millions"
Challenge 2: AI Narration Consistency
Early Gemini outputs were inconsistent — sometimes too technical, sometimes too casual. I solved this by:
- Providing explicit statistics in every prompt
- Using system prompts that establish voice and expertise level
- Caching responses to ensure chapter-to-chapter consistency
Challenge 3: The 3D Visualization Performance
Rendering a 3D survival surface with 2,139 patients was slow. I solved this by:
- Binning into a 15×15 grid (age × CD4)
- Pre-computing survival rates per cell
- Using WebGL rendering via Plotly
Challenge 4: Voice Synthesis Latency
ElevenLabs API calls took 3-5 seconds, breaking narrative flow. I solved this by:
- Making voice optional (text always visible)
- Adding loading spinners with "Generating voice..." feedback
- Caching audio in session state
Whats Next
- Knowledge Graph: Neo4j integration to explore treatment-outcome-demographic relationships
- Cohort Builder: Let users define custom patient subgroups and compare outcomes
- Policy Simulator: "What if we had started combination therapy 2 years earlier?"
- Multi-language: Translate narration for global health education
The Impact
ACTG 175 wasn't just a clinical trial. It was the turning point that transformed HIV from a death sentence into a manageable chronic condition. Today, 29 million people are on antiretroviral therapy worldwide.
This project exists because I believe data has stories to tell — stories that deserve to be heard, understood, and remembered.
"ACTG 175 didn't just test drugs — it tested our commitment to evidence-based medicine. The lives saved by this study number in the millions."
Built with: Python, Streamlit, Plotly, integrated Gemini api, integrated ElevenLabs api
Built With
- python
- streamlit
Log in or sign up for Devpost to join the conversation.