Inspiration

Healthcare fraud costs the U.S. over $100 billion annually — roughly 3–10% of total healthcare spending. Meanwhile, safety alerts for dangerous drugs often go unnoticed by clinicians overwhelmed with thousands of journal articles, and clinical trials scattered across registries make it nearly impossible for researchers to identify promising treatments efficiently. We kept asking ourselves: why does a hospital compliance officer need five different browser tabs open just to check whether a provider is flagged, whether a drug has been recalled, and whether a clinical trial is recruiting? The fragmentation of health intelligence data across disconnected systems — claims databases, FDA portals, ClinicalTrials.gov, PubMed, LLM observability dashboards — creates dangerous blind spots. HealthGuard AI was born from the conviction that these critical signals belong in a single, real-time dashboard that any healthcare stakeholder can navigate in seconds.

What it does

HealthGuard AI is a unified health intelligence platform built as a single-page Next.js dashboard with seven specialized tabs, each powered by a distinct backend integration: -Overview — Executive KPI cards showing total claims processed, flag rate, total billed amount, and average risk score, alongside a 12-month claims trend area chart, fraud-by-specialty horizontal bar chart, risk distribution donut, and compliance status bars. -Claims Fraud — A searchable, filterable table of 500 claims records with per-claim risk scores computed by 10 custom ClickHouse UDFs. Fraud indicators include Benford's law deviation on billed amounts ($D(x) = \sum |P_x - P_{Benford}|$), upcoding detection (pairing expensive CPT codes with minor ICD-10 diagnoses), phantom billing flags, and duplicate claim detection. A risk threshold slider lets analysts tune sensitivity in real time. -Provider Risk — A split-panel view listing 80 providers with a detail drawer showing NPI, specialty, risk gauge, compliance status, and license status. The composite risk score is computed via $\text{risk}_{\text{composite}} = 0.5 \cdot r + 0.3 \cdot \frac{f}{c} + 0.2 \cdot \frac{f}{n}$ where $r$ is average risk score, $f$ is flagged claims, $c$ is total claims, and $n$ is the specialty-wide average. -Drug Safety — Severity-coded event cards for 60 drug events (recalls, safety alerts, FDA warnings, price changes) sourced from FDA databases via the Nimble SDK. Filterable by event type, severity level, and source. -Clinical Intelligence — A ClinicalTrials.gov trial browser with 40 trials showing phase, status, enrollment, sponsor, and location. Includes phase distribution charts and filters, all populated via Nimble web scraping agents. -Compliance — Datadog Lapdog LLM observability monitoring: total traces, average latency, token usage, estimated cost, model distribution pie chart, and a recent traces table showing every LLM API call with input/output tokens and latency. -Research Map — AI-powered academic paper search via RapidReview Papyrus, enabling semantic search across arXiv with relevance scoring, citation counts, and one-click "Find Related" to discover connected research. Links to interactive 2D citation maps on rapidreview.io. The entire dashboard connects to live ClickHouse Cloud when credentials are configured — all seven tabs execute UDF-powered SQL queries in real time. If ClickHouse is unreachable, it seamlessly falls back to realistic HIPAA-compliant synthetic data so the UI is always functional for demos and development. How we built it The architecture follows a server-side proxy pattern to keep all API keys secure on the server. Each integration has its own Next.js API route: ClickHouse Cloud (/api/clickhouse) — An OLAP + vector database storing claims, providers, drug events, and clinical trials with Vector(768) embeddings. We created 10 custom SQL UDFs (compositeRiskScore, benfordsLawScore, claimVelocity, upcodingDetect, providerAnomalyScore, maskPHI, icd10Category, cptExpectedCost, drugRecallRisk, riskLevelLabel) that push computation into the database layer. The schema uses MergeTree engines with monthly partitioning on claim dates and vector embeddings for semantic search. Nimble SDK (/api/nimble) — Agentic web scraping that targets FDA recall databases, ClinicalTrials.gov, PubMed, and GoodRx for drug pricing. Structured scrape results are fed directly into ClickHouse tables. RapidReview Papyrus (/api/papyrus) — An academic research SDK that enables semantic paper search, related paper discovery, and citation mapping across arXiv. The Research Map tab calls this API for live results and links to interactive 2D visualization maps. Datadog Lapdog (/api/lapdog) — A local LLM observability tool that traces AI-powered fraud detection prompts, token counts, latency, and costs at localhost:8126. The frontend uses Next.js 16 (App Router), TypeScript, Tailwind CSS 4, shadcn/ui components (built on @base-ui/react), and Recharts for all data visualizations. A custom useApiData hook abstracts the ClickHouse-or-synthetic fallback pattern: it tries the live API first, and if the response indicates synthetic fallback, it generates fresh randomized data on the client side. The entire dashboard is designed with a dark slate theme (slate-950 background) and emerald accent colors for a professional operations-center feel.

Challenges we ran into

The biggest challenge was getting @base-ui/react (the new Base UI library from MUI that shadcn v4 uses under the hood) to work with patterns we were used to from Radix UI. Specifically, the asChild prop — a Radix convention for rendering a component as its child element — doesn't exist in Base UI. Our ResearchMapTab originally used to wrap tags for external links, and the build broke silently until TypeScript caught the unknown prop. We had to refactor all link buttons into styled elements with Tailwind classes matching the button design tokens. Another significant challenge was designing the synthetic data generator to be realistic enough for demos. Healthcare fraud patterns follow specific statistical distributions — for example, legitimate billing amounts should roughly follow Benford's law ($P(d) = \log_{10}(1 + 1/d)$), so our synthetic generator intentionally makes flagged claims deviate from this distribution. Provider risk scores needed to correlate with specialty (cardiology naturally has higher average billing than family medicine), and drug event severity needed to align with real FDA classification patterns. The ClickHouse UDF development was also non-trivial. Writing SQL functions that operate across MergeTree tables with Vector(768) columns required careful attention to type casting (ClickHouse's Float32 vs Float64 vs UInt8 enum handling) and ensuring that UDFs could be composed — for example, compositeRiskScore calls avg_risk_score and flagged_claims columns, which themselves may be derived from other UDFs in subqueries.

Accomplishments that we're proud of

We're most proud of the 10 custom ClickHouse UDFs that push complex fraud detection logic directly into the database query layer. Instead of fetching raw data and computing risk scores in application code, queries like SELECT *, upcodingDetect(procedure_code, diagnosis_code) AS upcoding_risk FROM claims return fully scored results in a single round trip. This means the dashboard can handle millions of claims without adding latency. The seamless fallback architecture is another highlight. The useApiData hook means the dashboard works identically whether connected to live ClickHouse Cloud or running in offline demo mode. Every tab renders with realistic data, correct distributions, and proper relationships (providers have consistent claim histories, drugs have plausible manufacturers and severities). Finally, we're proud of the seven-tab unified experience. HealthGuard AI brings together data sources that healthcare professionals normally access through 5+ separate systems — and presents them in a cohesive, dark-themed operations dashboard with consistent design language, responsive layouts, and real-time interactivity.

What we learned

We learned that database-level computation is dramatically faster than application-level computation for analytical workloads. Moving Benford's law scoring and upcoding detection into ClickHouse UDFs reduced our per-query response time from ~800ms (fetch + compute in Node.js) to ~120ms (single SQL query with UDFs). We also learned that shadcn v4's migration from Radix UI to Base UI has real breaking changes beyond what the migration guide covers. The asChild pattern is deeply embedded in shadcn component examples, and replacing it requires understanding both the Radix rendering model (render props with asChild) and the Base UI rendering model (direct element rendering with slot props). On the data side, we learned that healthcare fraud has surprisingly specific statistical fingerprints. Upcoding — billing for a more expensive procedure than was actually performed — clusters heavily in a few procedure-diagnosis pairs (e.g., CPT 99215 "Level 5 office visit" paired with ICD-10 Z00.00 "General adult medical examination"). Building realistic synthetic data required understanding these patterns rather than just randomizing numbers. What's next for HealthGuard AI

The next phase focuses on three areas:

Real-time streaming — Currently, ClickHouse queries are on-demand. We plan to add WebSocket-based live updates so that new claims, drug alerts, and screening results appear on the dashboard the moment they're ingested, using ClickHouse's INSERT triggers and Server-Sent Events. AI-powered anomaly narrative — Instead of just showing risk scores, we want an LLM to generate natural-language explanations for why a claim was flagged ("This claim is suspicious because the provider billed a Level 5 office visit for a routine checkup, and this provider's upcoding rate is 3.2x the specialty average"). We plan to use the token and cost tracking from Lapdog to keep this cost-efficient. Multi-tenant deployment — The current dashboard is single-instance. We're designing a multi-tenant architecture where different hospital systems can connect their own ClickHouse instances and see their own data, with row-level security and per-tenant API key isolation through Next.js middleware. We also plan to add an export system that generates HIPAA-compliant PDF reports from any tab, and a collaborative annotation layer where compliance teams can add notes and flag decisions directly on claims and providers within the dashboard.

Built With

clickhouse
nimble
papyrus
typescript

Updates

Shyam Desigan started this project — May 23, 2026 03:59 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.