ClearWater

Contamination risk map showcasing contaminated sectors
Work Order Dispatch: showing operator tasks pending approvals
Alert section showcasing ai detected anomalies
Platform Audit & Event logs: showing the records of all water safety reports, agent assignment and operation dispatches
Data connection Pipeline: showing the integrations, sync connections and sync logs

Inspiration

Water doesn't wait for a shift change.

When a contamination signal appears, a utility operator today opens SCADA in one tab, LIMS in another, calls someone about nearby maintenance, checks the billing portal for complaints, checks the weather, then synthesizes all of it by hand. That process takes 45 to 90 minutes. During that window, the water is still flowing. That gap is what ClearWater closes.

What It Does

ClearWater is a water quality operations intelligence agent. It monitors five historically disconnected data sources — SCADA sensors, LIMS lab results, CMMS maintenance records, billing complaints, and weather — and uses Gemini to detect contamination risk signatures no single system could catch.

When a cross-system pattern crosses the threshold, the agent doesn't act. It surfaces a pre-built evidence chain and draft actions, then waits for a human to approve.

90 seconds from signal to operator decision. Not 90 minutes.

Capability	What it does
Cross-system anomaly detection	Correlates sensor readings, lab failures, maintenance events, complaint volume, and weather risk into one view. A pH drop alone isn't an alert — a pH drop plus two lab failures, nearby maintenance, and four complaint tickets in the same sector is.
Evidence-backed risk scoring	Every assessment includes the exact data points that triggered it, a confidence score, and a human-readable reasoning trace.
Human-in-the-loop workflows	Work orders, incident reports, and regulatory notifications are drafted by the agent and locked behind explicit operator approval. Nothing goes out without a sign-off.
Full audit log	Every agent decision and human action is logged and retained — EPA water quality record-keeping requirements set the floor at 7 years.
Pipeline health dashboard	Real-time Fivetran connector status for all five sources. If SCADA stops syncing, the agent flags it before it becomes a blind spot.

How I Built It

Four layers, each load-bearing.

Fivetran — the data backbone Five connectors sync the operational sources into BigQuery: Custom Connector SDK for SCADA (simulated via TimescaleDB), database connectors for LIMS and CMMS, a webhook connector for billing complaints, and a REST API connector for Open-Meteo weather. Without Fivetran normalizing five sources with different schemas, auth methods, and update frequencies, the agent is reasoning over stale, inconsistent data — an expensive chatbot with a confidence problem.

BigQuery — the cross-correlation engine A single SQL view (water_quality_correlation) joins all five tables and computes a risk classification — NORMAL, LOW_RISK, MEDIUM_RISK, or HIGH_RISK — per distribution sector per hour. Auditable SQL, not a black box.

Google Cloud Agent Builder (Gemini) — the reasoning layer Four agent tools: risk_scorer, incident_reporter, work_order_creator, and alert_drafter. Three human approval gates are baked into the agent's instruction set — it is explicitly told to stop and present before any consequential action.

FastAPI + React — the operator interface Cloud Run backend with Pub/Sub event streaming. React dashboard with four views: Alert Feed, Approvals, Data Sources, and Audit Log. The Fivetran MCP server handles pipeline health checks and data freshness verification, so the agent can report not just what the data says but whether the pipelines feeding it are healthy.

Challenges We Ran Into

OAuth wall in Agent Builder's MCP tool UI. Agent Builder's built-in MCP interface doesn't support OAuth-based services cleanly. Workaround: sync BigQuery into a Vertex AI Data Store, which the agent queries natively. The Fivetran MCP server is used separately for connector health.

GCP org policy restrictions. Some environments block iam.serviceAccountKeyCreation by default. Sorting out the right IAM approach for the demo environment without violating org constraints took longer than expected.

Prompt engineering for specificity. Early Gemini assessments came back too vague — the kind of output that makes an operator distrust the system. Getting structured JSON with explicit field-level evidence (which_sensors_triggered, maintenance_proximity_km, complaint_count) required real iteration on the system prompt and few-shot examples.

Scoping "real-time" honestly. Water utilities aren't actually real-time environments — SCADA syncs every 15 minutes, labs report hourly. Showing what's genuinely fast (the agent's 90-second detection window) without overselling the underlying sync frequency was a design and communication challenge.

Accomplishments

The thing I'm most proud of isn't the stack. It's the scope discipline.

It would have been easy to build a water utility chatbot and call it an agent. ClearWater doesn't chat. It monitors, correlates, scores, drafts, and waits for a human. Every feature traces back to a real operator workflow — the night shift supervisor pulling status from four systems by hand, the compliance officer writing regulatory reports under time pressure.

The cross-correlation view is auditable SQL. The reasoning trace is human-readable and logged. The approval gates aren't UX polish — they're the core safety guarantee.

What I Learnt

Fivetran is load-bearing, not optional. The reason ClearWater can reason across five systems is that Fivetran handles schema normalization, sync scheduling, and connector reliability across five fundamentally incompatible sources. A custom ETL would have cost two weeks and been worse.

Agents fail on data quality before model capability. The agent is only as trustworthy as the pipeline behind it. Bad join keys don't surface until the agent starts confidently reporting the wrong thing.

Human-in-the-loop is a product feature, not a disclaimer. Operators don't trust systems that act without them. Making the approval gates visible and hard to skip — rather than a buried checkbox — is what makes this something a real operator would use on a night shift.

Built With

bigquery
fastapi
fivetran
gemini
gemini-cloud-agent-builder
google-bigquery
openmetro
python
react
typescript
vertex-ai

Updates

Favour Chuks started this project — Jun 11, 2026 07:04 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.