Foreman

Inspiration (draft — refine with your real story)

A wrong figure in a due-diligence memo isn't a bug — it's a lawsuit, a bad deal, a broken trust. Research agents today are fluent but unaccountable: they'll state a number with total confidence and no source. We'd already built Renji, a conscience layer that refuses harm and ungrounded claims. MongoDB — where the evidence actually lives — was the right place to make "can't lie" literal: a finding is only allowed if it cites the documents it came from. So we built Foreman: an analyst that's fast and one you could put in front of a regulator.

→ Tell me where you actually started (the MongoDB part, what specifically inspired you) and I'll weave your real story in here.

What it does

Foreman is an autonomous research analyst for high-stakes work where a made-up number is catastrophic: financial due diligence, compliance, and investigations. It runs multi-step research over a MongoDB knowledge base, gathers evidence, reasons over it, and drafts findings with citations that are checked before they can be written.

The key behavior is trust:

Every finding must cite source documents that actually exist in MongoDB.
A claim whose citations cannot be verified is refused before it is written.
Unfiltered updateMany and deleteMany operations are blocked to prevent accidental whole-collection rewrites.
Writes are held for human approval, so nothing reaches the knowledge base until someone explicitly says yes.
In private mode, confidential records are masked before they reach the cloud model, then restored locally after reasoning.

How we built it

Brain — Gemini 3. Plans the research, reasons over the evidence, and drafts the final findings.
Framework — Google ADK (Agent Development Kit). The agent is built as an ADK LlmAgent, using the open-source foundation behind Vertex AI Agent Builder and Agent Engine.
Hands & memory — MongoDB MCP server. The agent reads the knowledge base, retrieves evidence, and writes findings through MongoDB's Model Context Protocol server.
Conscience — Renji. Before a write, mongo_state.py verifies that cited sources resolve in MongoDB and refuses dangerous mass mutations; the harm check and human approval gate follow after that.
Privacy layer. In private mode, PII is masked before anything reaches Gemini and restored locally afterward.
Visible trail. The app shows what was cited, what was refused, and what is still waiting for approval.

Challenges we ran into

Making "can't lie" enforceable instead of aspirational, by refusing findings unless their citations resolve to real MongoDB documents.
Catching MongoDB-specific mass mutation risks, like unfiltered updateMany and deleteMany, that a generic safety check would miss.
Keeping confidential records private while still letting a cloud model reason on stand-ins.

Accomplishments that we're proud of

The grounding and source-verification checks are unit-tested against mocked MongoDB responses in test_guard.py.
A finding that cites a missing _id is refused, while a finding that cites a real document can proceed to approval.
Unfiltered deleteMany and updateMany calls are refused before they can run.
Private mode keeps sensitive values out of the model while preserving the original data locally.

What we learned

Trust has to be enforced in the control flow, not promised in the prompt.
A research agent becomes much more credible when every finding can be traced back to real source documents.
Data safety and privacy need separate guardrails: one for writes, one for what gets sent to the model.

What's next for Foreman

Retrieval via Atlas Vector Search for semantic evidence gathering.
Schema-aware write validation layered on top of the existing citation check.
Deployment to Vertex AI Agent Engine.
Extending the conscience layer to other partners' MCP servers with the same trust guarantees.

Verified (FACT): the grounding/verification + mass-mutation guard are unit-tested against mocked MongoDB responses (test_guard.py): a finding citing a non-existent _id is REFUSED (decided_by=mongodb-state), a finding citing a real one passes to approval, an unfiltered deleteMany/updateMany is REFUSED, reads flow free. TARGET: not yet run end-to-end against a live cluster (needs MDB_MCP_CONNECTION_STRING).