Inspiration

ACTG 175 Treatment Policy Storyboard

The Story Behind the Data

In 1991, while the world watched the AIDS epidemic claim over 100,000 lives annually in the United States alone, a group of researchers asked a simple question that would change medicine forever: What if we combined drugs instead of using them alone?

This project brings that pivotal moment in medical history to life through interactive data storytelling.


Inspiration

The ACTG 175 dataset doesn't look like much at first glance — patient IDs, CD4 counts, treatment codes. But then I calculated the outcomes:

ZDV monotherapy: CD4 cells declined by 17. Event rate: 34%.

ZDV + ddI combination: CD4 cells increased by 54. Event rate: 20%.

Patients on the "standard of care" were getting worse. The combination therapy patients were thriving. This was not just data, this was the moment combination antiretroviral therapy was proven, the discovery that would eventually save millions of lives worldwide.

What We Learned

The Science

  • CD4/CD8 ratios below 1.0 indicate immune suppression — 98% of trial patients had abnormal ratios
  • Synergy is real: ZDV+ddI didn't just add benefits, it multiplied them (actual effect exceeded predicted additive effect by +50 cells)
  • Equity matters: IV drug users actually had better outcomes (19% vs 25%), shattering stigma when given proper care

The Tech

  • Gemini 2.0 Flash can generate medically-accurate, contextually-aware narration in real-time
  • ElevenLabs voice synthesis makes data accessible to visually impaired users
  • Plotly's 3D surfaces reveal patterns invisible in 2D — the "death valley" where low CD4 meets high age

The Craft

  • Data storytelling isn't about showing all the data — it's about showing the right data at the right moment
  • AI narration works best when grounded in specific statistics, not vague summaries
  • Interactive visualizations should invite exploration, not overwhelm

How We Built It

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Streamlit Frontend                    │
├─────────────┬─────────────┬─────────────┬───────────────┤
│  9 Chapters │ 13 Viz Pages│  RAG Chat   │ Risk Calc     │
├─────────────┴─────────────┴─────────────┴───────────────┤
│                   Feature Engineering                    │
│            (49 derived features from 23 raw)            │
├─────────────────────────────────────────────────────────┤
│  Gemini 2.0 Flash  │  ElevenLabs TTS  │  Plotly/MPL    │
└─────────────────────────────────────────────────────────┘

Feature Engineering

From 23 raw clinical variables, I engineered 49 features:

$$\text{CD4 Change} = \text{CD4}{20wk} - \text{CD4}{baseline}$$

$$\text{Immune Ratio} = \frac{\text{CD4}}{\text{CD8}}$$

$$\text{Risk Score} = \sum_{i} w_i \cdot \mathbb{1}[\text{risk factor}_i]$$

Response categories were computed as:

  • Super Responder: $\Delta \text{CD4} > 150$
  • Improved: $0 < \Delta \text{CD4} \leq 150$
  • Stable: $-50 < \Delta \text{CD4} \leq 0$
  • Declined: $\Delta \text{CD4} \leq -50$

Visualization Philosophy

Each visualization serves a narrative purpose:

Chapter Visualization Story Beat
Prologue Animated counter Scale of the crisis
Demographics Parallel coordinates Every line is a life
Results Survival curves Watch the treatments diverge
Deep Dive 3D surface Find the "death valley"
Equity Dumbbell chart Measure the gaps
Winners Radar chart Crown the champion

AI Integration

Gemini prompts are grounded in computed statistics:

prompt = f"""
You are narrating the ACTG 175 trial results.
ZDV+ddI: {zdv_ddi_rate:.1f}% event rate, +{zdv_ddi_cd4:.0f} CD4
ZDV only: {zdv_rate:.1f}% event rate, {zdv_cd4:.0f} CD4
Explain what this means for patients in 2-3 sentences.
"""

This prevents hallucination while enabling natural, contextual narration.


Challenges I Faced

Challenge 1: Making Statistics Emotional

Raw numbers don't move people. "34% event rate" means nothing until you realize that's 1 in 3 patients developing AIDS or dying. I solved this by:

  • Leading with human context ("100,000 lives lost annually")
  • Using the strip plot where every dot is a patient
  • Adding the quote: "The lives saved by this study number in the millions"

Challenge 2: AI Narration Consistency

Early Gemini outputs were inconsistent — sometimes too technical, sometimes too casual. I solved this by:

  • Providing explicit statistics in every prompt
  • Using system prompts that establish voice and expertise level
  • Caching responses to ensure chapter-to-chapter consistency

Challenge 3: The 3D Visualization Performance

Rendering a 3D survival surface with 2,139 patients was slow. I solved this by:

  • Binning into a 15×15 grid (age × CD4)
  • Pre-computing survival rates per cell
  • Using WebGL rendering via Plotly

Challenge 4: Voice Synthesis Latency

ElevenLabs API calls took 3-5 seconds, breaking narrative flow. I solved this by:

  • Making voice optional (text always visible)
  • Adding loading spinners with "Generating voice..." feedback
  • Caching audio in session state

Whats Next

  • Knowledge Graph: Neo4j integration to explore treatment-outcome-demographic relationships
  • Cohort Builder: Let users define custom patient subgroups and compare outcomes
  • Policy Simulator: "What if we had started combination therapy 2 years earlier?"
  • Multi-language: Translate narration for global health education

The Impact

ACTG 175 wasn't just a clinical trial. It was the turning point that transformed HIV from a death sentence into a manageable chronic condition. Today, 29 million people are on antiretroviral therapy worldwide.

This project exists because I believe data has stories to tell — stories that deserve to be heard, understood, and remembered.

"ACTG 175 didn't just test drugs — it tested our commitment to evidence-based medicine. The lives saved by this study number in the millions."


Built with: Python, Streamlit, Plotly, integrated Gemini api, integrated ElevenLabs api

Built With

Share this project:

Updates