Inspiration I was reading about the Vioxx scandal, where a popular painkiller caused thousands of heart attacks before it was finally pulled from the market. The tragedy wasn't that the data didn't exist—it was buried in mountains of paperwork that humans couldn't analyze fast enough.

Today, the FDA receives over 2 million Adverse Drug Event (ADE) reports annually. Pharmacovigilance teams are drowning in data, often taking weeks or months to detect a safety signal. In that gap, patients are at risk, and regulatory fines can reach billions.

I asked myself: What if we could build an AI system that reads every single report in real-time and uses statistical reasoning to catch the next Vioxx before it becomes a crisis? That is PHAROS.

What it does PHAROS (Pharmacovigilance Autonomous Reasoning and Oversight System) is a four-agent AI system that automates the entire drug safety surveillance pipeline. It detects safety signals up to 100x faster than manual review.

The system consists of four specialized agents:

SENTINEL (The Watcher): Ingests adverse event reports from the FDA FAERS API and monitors for volume anomalies. ANALYST (The Statistician): Executes complex ES|QL queries to calculate the Proportional Reporting Ratio (PRR), a WHO-standard statistical method for identifying disproportionate drug-event combinations. SCRIBE (The Author): Auto-generates regulatory-compliant clinical case narratives (MedWatch 3500A / PSURs) citing specific evidence. HERALD (The Messenger): Dispatches tiered alerts (Slack, Jira, Email) based on signal severity. How we built it We built PHAROS on Elasticsearch Serverless and the Elastic Agent Builder.

The core intelligence relies on a Hybrid Search architecture:

We use ELSER (Sparse Encoder) to semantically match clinical narratives (e.g., finding that "cardiac arrest" and "myocardial infarction" are related). We use ES|QL (Elasticsearch Query Language) for the heavy statistical lifting. The Signal Detection Math The most critical part of the project is the Proportional Reporting Ratio (PRR) calculation. Instead of just counting errors, ANALYST calculates this formula dynamically in Elasticsearch:

P R

R

a / ( a + b ) c / ( c + d ) PRR= c/(c+d) a/(a+b)

Where:

( a ) = Cases with the specific drug and specific event ( b ) = Cases with the specific drug but other events ( c ) = Cases with other drugs and the specific event ( d ) = Cases with other drugs and other events If ( PRR \ge 2 ) and Chi-Squared ( \ge 4 ), PHAROS flags a safety signal.

Challenges we ran into ES|QL Complexity: Translating the standard 2x2 contingency table for PRR into a single efficient ES|QL query was difficult. We had to perform aggregations on millions of records in real-time. Hallucinations: Early versions of the SCRIBE agent would invent patient details. We fixed this by forcing the LLM to strictly use the "Context" retrieved from our Elasticsearch vector store, implementing a "Retrieval-Augmented Generation" (RAG) pipeline that cites its sources. Agent Orchestration: Getting four agents to pass data to each other (Sentinel -> Analyst -> Scribe) without dropping context required a robust state management system in Python. Accomplishments that we're proud of Real-time Detection: We reduced the signal detection time from weeks to seconds. Regulatory Compliance: The reports generated by SCRIBE are not just text; they follow the structure required by 21 CFR Part 314, making them actually useful for compliance officers. The Dashboard: Seeing live signals pop up on the Kibana dashboard as "Critical" was a huge moment of validation. What we learned I learned that Elastic Agent Builder is incredibly powerful when combined with structured data analytics. Most people use LLMs for chat, but using them to drive statistical analysis (via ES|QL) creates a much more reliable system. I also gained deep appreciation for the complexity of pharmacovigilance and drug safety regulations.

What's next for PHAROS Electronic Health Records (EHR) Integration: Integrating real-world evidence from hospital systems. Social Media Listening: Using NLP to scrape Reddit and Twitter for early warning signs before official FDA reports are even filed. Multi-language Support: Processing EMA (European) reports in French and German.