Inspiration
Every marketer knows the cold sweat after hitting Send to 200,000 people. Was the audience fresh? Did the sync break overnight? Are there duplicates? Did someone who unsubscribed last week slip back in?
Marketing teams run on data they didn't build and can't easily inspect. The data team has warehouses, tests, and lineage; the marketing team has a send button and hope. SendGuard closes that gap: an AI agent that stands between the campaign and the send button, and only clears the send when the data can be trusted.
Fivetran moves the data. SendGuard makes sure you can trust it before you hit send.
What it does
Given "Campaign X is scheduled to send to audience DE campaign_audience — validate and clear it", SendGuard works through a strict validation doctrine, narrating every step:
- Freshness — when did Fivetran last sync SFMC to BigQuery? If stale, it triggers a sync and waits.
- Parity — SFMC says 200,000 rows; does BigQuery agree? Divergence means pipeline loss or post-sync edits.
- Integrity — duplicate subscriber keys, null emails, and audience members who are unsubscribed in the subscribers table (a compliance violation waiting to happen).
- Verdict — on PASS it releases the send. On FAIL it holds the send immediately (holding is the only action it takes without human approval — it's always the safe direction), explains every defect in plain marketer language ("8,000 people would get this email twice"), then — with approval — builds a repaired audience in BigQuery, pushes it back into SFMC via Fivetran Activations, verifies the landing with a live row count, and only then releases.
In our test environment it catches exactly what we seeded: 8,000 duplicates and 5,000 unsubscribed members in a 200,000-row audience, repairs it to a clean 187,000, and round-trips the fix into SFMC in about three minutes.
How we built it
- Agent: Python + Google Agent Development Kit (ADK), powered by Gemini 3.1 Pro, deployed on Cloud Run with the ADK web UI as the hosted interface.
- Fivetran control plane: a fork of the official fivetran-mcp server attached to the agent as an MCP toolset (connection status, sync triggers, schema config). We extended the fork with three new tools for Fivetran Activations (list / trigger / wait), which it lacked.
- Data plane: SFMC data extensions → Fivetran SFMC connector (SFTP mode) → BigQuery; repaired audiences flow back through BigQuery → Activations → SFMC.
- Native tools: parameterized BigQuery SQL, SFMC REST/SOAP (row counts, DE schemas, hold/release send flags), and a server-side blocking wait for Activations runs.
- Test data: a Faker-based generator producing 1M subscribers, 1.5M engagement events, and a 200k audience seeded with realistic data-quality defects — 2.7M rows moved through the real pipeline.
Challenges we ran into
This project was a guided tour of every sharp edge in enterprise marketing infrastructure:
- SFMC's SFTP is wonderfully ancient. The server only offers
ssh-rsahost keys (modern clients refuse them), and our FTP user was silently created as key-pair-only, so password auth failed with a misleading "invalid credentials." Diagnosing that required speaking raw SSH to the server. - Data extensions sync on a daily window. The Fivetran connector does full daily re-imports of DEs (SFMC's APIs offer no incremental path). Our audience DE was enabled after the day's export window — no amount of sync triggering would move it. We solved the backfill by studying the connector's file contract on the SFTP server and staging the export file in exactly that format, letting Fivetran ingest it through its own pipeline.
- Activations is secretly the Census API. No endpoint on
api.fivetran.comtouches Activations; the real API lives atapp.getcensus.comwithBearer secret-token:auth. The MCP fork now encodes that knowledge. - LLM operational hygiene. Gemini happily polled a sync-status tool 35 times in a row; the fix was a blocking server-side wait tool. And
gemini-3-pro-previewwas retired by Google mid-hackathon — the model listing endpoint still advertises it, but generation 404s. - The dress rehearsal earned its keep: it caught the agent verifying the wrong data extension after repair, and Cloud Run's scale-to-zero silently destroying in-memory sessions.
Accomplishments that we're proud of
- A complete closed loop on real infrastructure: SFMC → Fivetran → BigQuery → validation → repair → Activations → SFMC, with 2.7M rows moved through the genuine pipeline — no mocks anywhere.
- The agent caught every seeded defect to the exact row: 8,000 duplicates, 5,000 consent violations, and a live 10,135-row parity break we created by deleting warehouse rows mid-session.
- Safety-first agent design that held up: hard human-approval gates on every write, hold-before-ask semantics, and honest error reporting — when an API call failed mid-run, the agent reported it and asked instead of hallucinating success.
- Extending the official Fivetran MCP with a working Activations toolset, turning a read-mostly server into a full pipeline control plane.
What we learned
The deepest lesson is the project's own thesis: bulk marketing data still moves on files, not APIs. SFMC's rowset API pages at ~2,500 rows with no change tracking and no snapshot isolation — paging a live 2.5M-row table for half an hour produces silent duplicates and gaps. A file export is atomic: complete and consistent, or nothing. When Fivetran — whose entire business is API extraction — chose FTP for this object type, that was the strongest evidence available. An agent that validates data has to understand how the data actually moves.
We also learned that agent reliability is mostly doctrine engineering: a strict, ordered checklist in the system prompt, hard approval gates on every write, "hold is the only unprompted action," and honest error reporting turned a chatty LLM into something you could plausibly put between a Fortune 500 marketing team and their send button.
What's next for SendGuard
Scheduled pre-send validation for every campaign, anomaly baselines from engagement history (the 1.5M events are already synced), Slack approvals instead of chat, and richer repair strategies — re-permissioning flows for lapsed consent rather than just exclusion.
Built With
- fivetran
- google-bigquery
- sfmc
Log in or sign up for Devpost to join the conversation.