Inspiration

There are 207 million active creators in the world. Three million sit in a brutal middle ground: big enough to attract real brand deals, not big enough for a real talent manager. A creator with 80,000 Instagram followers can earn ₹60 lakh a year from brand partnerships — and spend 12 hours every week managing the chaos manually.

No agency will sign them. No tool does the work for them.

We watched a mid-tier creator lose ₹3.2 lakh in one quarter — not because deals weren't there, but because she responded too slowly, missed a follow-up, and signed a contract with an unlimited exclusivity clause that blocked two competing deals. Same root cause every time: no operational system. No institutional memory. No one watching her back.

We asked one question: what if the system just did the work?

That question is ThreadComb. Every brand deal lives in a thread. ThreadComb reads them all.


What it does

ThreadComb is a three-agent AI system built on Google Cloud ADK, Gemini 2.5, and MongoDB Atlas that replaces the operational layer of a creator's brand deal business. Every agent reads from and writes to a living MongoDB Atlas database — the creator's Skills Map — that gets smarter with every email processed and every deal closed.

🧠 Agent 1 — DNA Reader

Runs at onboarding. Builds institutional memory.

Connects to the creator's Gmail and reads 6 months of brand deal threads. For each thread, Gemini 2.5 Flash extracts structured signals: brand name, deal amount, payment status, response time, contract terms — all written to MongoDB Atlas as richly nested documents.

Once ingestion completes, three MongoDB Aggregation Pipelines fire:

  • Revenue leakage: unanswered deals the creator never replied to
  • Payment reliability: brands ranked by how late they pay
  • Rate gap: creator's accepted rates vs. market P50 for their niche and follower tier

The output is a Skills Audit Report — a PDF with specific rupee numbers. Not AI guessing. Every figure is sourced from a MongoDB aggregation query against the creator's own data.

MongoDB used: Document ingestion, $group $match $lookup $sort aggregation, Atlas Search, document upsert with running payment stats.

⚡ Agent 2 — Deal Chief

Fires within 30 seconds of a new brand deal email. Draft ready in 60 seconds.

When a new brand deal email arrives via Gmail push webhook, Deal Chief performs four MongoDB queries before writing a single word:

  1. brands.findOne — this sender's payment history: avg days to pay, reliability score, overdue count
  2. Atlas Vector Search — find the 5 most similar historical deals using $vectorSearch with RETRIEVAL_QUERY task type (asymmetric from indexing — critical for quality)
  3. skills_map query — creator's confirmed PREFER/AVOID preferences at confidence ≥ 0.70
  4. niche_graph aggregation — P25/P50/P75 rates for creator's niche + follower tier + deal type

Armed with four MongoDB context layers, Gemini 2.5 Flash generates a voice-calibrated reply. A separate independent Gemini call (Call B) evaluates voice compliance without knowing it's evaluating Call A output — the two-model evaluator pattern. If score < 0.75: regenerate.

Creator sees: draft + flag panel (exclusivity too long, rate below P50, brand slow payer) + brand reliability score. One tap: email sends. MongoDB deal document inserted. Calendar follow-up event created.

MongoDB used: Document lookup, Atlas Vector Search (cosine similarity 768d, self-match = 1.0000 verified), aggregation rate benchmarking, document insert, $lookup brands join.

📊 Agent 3 — Revenue Guardian

Runs daily. Chases every overdue invoice with the right tone.

The urgency score and recommended tone are computed inside a MongoDB aggregation pipeline — not Python, not Gemini:

db.invoices.aggregate([
  { $match: { status: { $in: ["pending","overdue"] }, days_overdue: { $gt: 0 } } },
  { $lookup: { from: "brands", localField: "brand_id", foreignField: "_id", as: "brand" } },
  { $unwind: "$brand" },
  { $addFields: {
    urgency_score: { $add: [
      { $multiply: ["$days_overdue", 0.6] },
      { $multiply: [{ $subtract: [1, "$brand.payment_intelligence.payment_reliability"] }, 40] }
    ]},
    recommended_tone: { $switch: { branches: [
      { case: { $lte: ["$days_overdue", 14] }, then: "gentle" },
      { case: { $lte: ["$days_overdue", 45] }, then: "firm" }
    ], default: "final_notice" }}
  }},
  { $sort: { urgency_score: -1 } }
])

The tone is a function of real brand payment behaviour — not a template. A MongoDB Change Stream watches for invoice status → "paid" and automatically updates the brand's payment_intelligence.avg_payment_days using a running average. The brand gets smarter with every closed deal.

MongoDB used: Aggregation pipeline with $addFields, computed fields, $switch, Change Streams for reactive brand intelligence update.


How we built it

The Skills Map architecture: MongoDB Atlas is not a data store for ThreadComb — it IS the Skills Map. The entire thesis (institutional knowledge made executable) lives in MongoDB documents. Agents read from it before every action. Agents write back after every outcome. It compounds.

Embedding pipeline: gemini-embedding-2 at 768 dimensions (MRL truncation from 3072d — saves 4x storage on Atlas M0 free tier). 768d vectors are NOT pre-normalised by the API — we apply L2 normalisation before storing. Verified: np.linalg.norm(vector) = 1.000000. RETRIEVAL_DOCUMENT at index time, RETRIEVAL_QUERY at search time — asymmetric task typing improves retrieval quality.

No hallucinated amounts: Gemini extraction uses a structured Pydantic schema with amount_ambiguity_flag: bool. If an email says "budget is flexible" or "50 hazaar" — amount_ambiguity_flag=True, amount_inr=None. Always. The Audit Report's total_recoverable_value is the exact arithmetic sum of non-ambiguous deals only. Verified: ₹75,000 sum matched precisely, zero LLM math.

Two-model voice compliance: Draft generation (Call A) and voice evaluation (Call B) are separate Gemini calls in separate contexts. The evaluator never knows it's evaluating Call A output. This prevents sycophantic self-scoring. If compliance < 0.75: regenerate with tighter constraints. Confirmed two independent API calls in application logs.

Human-in-the-loop as architecture: The ACTION_POLICY table is Python code, not a prompt. send_gmail_reply() is structurally callable only from the /deals/approve/{deal_id} and send_invoice_followup() functions — which themselves are only triggered by explicit creator approval endpoints. No autonomous sending. Verified via codebase grep.


Challenges we ran into

Making MongoDB the source of truth, not the LLM. Every financial figure in the Audit Report must be sourced from a MongoDB aggregation — not from Gemini's training data. If MongoDB doesn't have the data, the agent says "insufficient data" rather than hallucinating a number. Building the system to consistently reject its own LLM knowledge in favour of MongoDB-grounded facts required explicit schema design: amount_ambiguity_flag, extraction_confidence thresholds, and HITL routing for anything below 0.70 confidence.

Atlas Vector Search as analogical reasoning, not search. Most implementations answer "what document is most similar?" ThreadComb uses it differently: before drafting a negotiation reply, find what terms this creator specifically accepted for similar deals in the past. That is precedent retrieval — a fundamentally different use of vector similarity that required careful embedding text design and asymmetric task typing.

Voice calibration across Indian language contexts. Hindi-English code-switching is the dominant communication style for Indian creators. A beauty creator from UP writes "yaar this deal looks great but exclusivity ke liye 30 din max chalega" — the voice profiler must capture this pattern and the draft generator must reproduce it naturally. The solution: explicit hindi_english_ratio field in the voice profile, a Stage 0 language detection gate, and a separate Hindi extraction prompt variant.


Accomplishments we're proud of

The aggregation pipeline as the product. The financial figures in the Audit Report — every rupee amount — comes from MongoDB aggregation queries against real operational data. AI narrates. MongoDB calculates. This distinction is the architectural decision we're proudest of.

Atlas Vector Search enabling analogical reasoning. The Deal Chief doesn't draft a reply based on general knowledge. It drafts based on what this creator specifically accepted in the past for similar deals. MongoDB surfaces that history in milliseconds.

The agent_actions audit log as a trust primitive. Every agent action — every query fired, every draft generated, every email sent — is logged as an immutable append-only document in MongoDB. The creator can see exactly what the agent saw, what it decided, and why. In a domain where agents make consequential decisions about someone's business, transparency is not optional. It is the product.


What we learned

The bottleneck in AI agent performance is not the model — it is the context layer.

A Gemini 2.5 Flash agent with no Skills Map context writes a generic brand deal reply. The same model with a MongoDB Skills Map — brand payment history, creator preferences, historical rate benchmarks, voice profile — writes a reply that is specific, calibrated, and defensible. The model didn't get smarter. The data got richer.

MongoDB Atlas is not an integration layer for ThreadComb. It is the product. The agents are the interface. The Skills Map is the value. And MongoDB is the infrastructure that makes the Skills Map queryable, searchable, and reactive in real time.


What's next for ThreadComb

Immediate (30 days): Complete the 30-day free Skills Audit pilot with 10 mid-tier Indian creators. Validate the ₹8,300/month subscription price. Launch fan management agent (Instagram DM classification — App Review submitted).

3–6 months: Fivetran connectors (YouTube Studio analytics + Stripe → MongoDB Atlas for richer Skills Map). Arize Phoenix LLM observability (trace every agent decision, measure deal closure rates vs. agent recommendations). Agency tier (5–50 creator roster management on one MongoDB database).

Year 1–2: The creator economy is the beachhead. The thesis — AI agents grounded in MongoDB institutional knowledge replacing entire operational layers — extends beyond creators: healthcare admin, EdTech operations, small business finance. The pricing evolves from subscription to outcome-based: 5% of recovered revenue. Not software that helps people do the work. Software that does the work, billed per outcome.

Built With

  • fastapi
  • gemini-2.5-flash
  • gemini-2.5-flash-lite
  • gemini-2.5-pro
  • gemini-embedding-2
  • gmail-api
  • google-calendar-api
  • google-cloud
  • google-cloud-adk
  • google-cloud-pub/sub
  • google-cloud-run
  • google-cloud-scheduler
  • google-cloud-tasks
  • google-secret-manager
  • mongodb-aggregation-pipelines
  • mongodb-atlas
  • mongodb-atlas-vector-search
  • mongodb-change-streams
  • motor
  • next.js-15
  • pydantic
  • python
  • react-19
  • reportlab
  • tailwind
Share this project:

Updates