MindMesh

Inspiration

Mental health crises don't announce themselves. By the time someone reaches out for help, the warning signs have been visible for days in how fast they type, how often they delete what they've written, what time they're journaling at 2 AM. Most digital wellness tools wait for the user to explicitly ask for help, which is exactly when they're least likely to. We wanted to build something that listens continuously, decides autonomously, and intervenes before things escalate — without becoming intrusive when everything is fine.

The hardest design problem in agentic systems isn't getting the agents to act. It's getting them to know when not to act. That challenge building agents with restraint, observability, and clinically grounded safety floors is what pulled us into this project.

What it does

MindMesh is a multi-agent wellness orchestrator. It ingests behavioral signals from a user typing cadence, deletion frequency, pause patterns, journal text, inactivity windows and runs them through a graph of four specialized LLM agents that collaborate rather than execute in sequence:

Emotion Agent — identifies mood, stress, and anxiety from text and behavior Risk Agent — scores severity from low to critical, raises escalation flags Intervention Agent — picks the right response: grounding exercise, breathing protocol, professional referral Reflection Agent — surfaces behavioral patterns so the user understands themselves better The graph adapts dynamically. Low-risk entries skip heavy intervention; high-risk entries auto-escalate monitoring level; crisis-level vocabulary always triggers a professional referral with the 988 hotline as a hard-coded safety floor. Every event is enriched with behavioral context, persisted for trend detection, and instrumented for end-to-end observability.

How we built it

Frontend — Next.js 14 + TypeScript + Tailwind. Real-time journal entry, live dashboard cards for each agent's output, a workflow visualization showing the graph executing in real time, and sponsor integration pills.

Backend — FastAPI with REST + WebSocket endpoints. The core is a LangGraph state machine where each agent is a node, Pydantic models are the contract between nodes, and conditional edges implement the dynamic routing. An in-memory history manager feeds the last ten events back into every agent so trend detection actually works.

LLM layer — OpenAI GPT-4o for all four agents, with a shared retry/JSON-parse utility and a per-agent prompt file we tuned iteratively.

Sponsor integrations — three of them, all wired into the pipeline with graceful degradation if credentials are absent:

ClickHouse Cloud — every analyzed event lands in a wellness_events table (14 columns, JSON columns for full agent outputs), backed by a wellness_hourly_agg materialized view that pre-aggregates stress, anxiety, and risk distribution per hour. Four REST endpoints (/analytics/timeline, /correlation, /interventions, /summary) expose this for any downstream consumer — clinician dashboards, parent monitoring apps, HR wellness portals. Datadog LLM Observability — every pipeline run produces a full trace tree in lapdog.datadoghq.com: a mindmesh.pipeline workflow span with four nested mindmesh. agent spans, each annotated with input/output snapshots, and an auto-captured openai.request span underneath each agent containing the exact prompt, completion, tokens, latency, and cost. We ran it in agentless mode so it works from any laptop without installing the Datadog Agent.

Challenges we ran into

Module name collision with ddtrace. Our local mindmesh/agents/ package shadowed the openai-agents library that Datadog's LLM Observability tries to auto-instrument, causing LLMObs.enable() to silently fail with ModuleNotFoundError: No module named 'agents.tracing'. We patched it by setting DD_PATCH_MODULES=openai_agents:false programmatically before enabling LLMObs. The "API key is missing" 403 mystery. Datadog's UI shows two columns that look almost identical: Key ID (a UUID with dashes) and Key (32 hex chars, revealed behind an eye icon). We spent an hour debugging a working integration that was sending requests with the wrong identifier. ClickHouse schema initialization. Our SQL parser was stripping comments too aggressively and skipping entire CREATE TABLE statements, leaving the database silently empty until we tried to query it. Rewriting the comment-stripping to operate line-by-line before the semicolon split fixed it. uvicorn --reload does not reload .env. This burned us multiple times during sponsor integration — code reloads, but environment variables are cached from process start. We added explicit startup logs for every integration so the failure mode is now loud, not silent. State loss between LangGraph nodes. We patched the orchestrator to explicitly forward transient context fields. Divergent git branches. Working in parallel on frontend and backend produced merge conflicts on app/page.tsx that we resolved by carefully integrating both sides rather than picking a winner. Accomplishments that we're proud of Three sponsor integrations that actually work end-to-end not just badges on a slide. ClickHouse persists every event with a materialized view, Datadog LLM Observability traces every prompt and completion in production-grade detail, Graceful degradation everywhere. Pull any sponsor's API key and the system stays online with heuristic fallbacks that match the same data contracts. The graph never breaks. A clinically grounded safety floor. Crisis-level vocabulary always escalates to professional referral, regardless of what the LLM returns. Three layers of defense: Pydantic schema validation, JSON parse retry, and a hard-coded escalation rule. A trace tree that tells the full story. In lapdog.datadoghq.com, every analyze call is one workflow with four agent children and four OpenAI calls underneath visible, queryable, monitorable. We can spot drift, evaluate quality, and alert on regressions per agent. The system knows when to stay quiet. The hardest engineering decision was wiring conditional routing so the Intervention agent can decide not to act on low-risk entries. That restraint is what makes it usable.

What we learned

Observability is not optional for LLM systems. Until we plugged in Datadog LLM Observability, we were debugging by print(). After plugging it in, we could see exactly which agent's prompt was producing weird output and iterate in minutes instead of hours. LLM apps without trace-level visibility are unshippable. Analytics persistence changes what's possible. ClickHouse turned a stateless chatbot pattern into a system with memory. The same ten-event window that powers trend detection in the agents could power clinician dashboards, weekly user reports, or population-level research all from one source of truth. Pydantic-as-contract scales. Defining BehavioralSignal, EmotionResult, RiskResult, etc. early meant every agent, every integration, and every API endpoint spoke the same language. Refactors that would normally break things were trivial. LangGraph's conditional edges are the unlock. Without them, multi-agent systems collapse into either rigid pipelines or unpredictable chat loops. Routing on a typed state field gives us deterministic, explainable behavior. The Datadog and ClickHouse UIs both have copy-the-wrong-thing footguns. Document for your team which icon to click. Seriously.

What's next for MindMesh

Real continuous ingest. Today the frontend posts on each journal entry; next we'll stream keystroke-level telemetry over the existing WebSocket so the system reacts within seconds, not after a deliberate submission. Per-user evaluations in Datadog LLM Observability. Enable Datadog's built-in evaluators (Topic Relevancy, Sentiment, Failure to Answer) on every agent span, then build per-user quality dashboards on top of the trace data. ClickHouse-powered cohort analytics. Use the existing wellness_hourly_agg view to publish anonymized population trends — useful for university wellness centers, HR teams, and clinical research. HIPAA-track deployment. Swap GPT-4o for an in-VPC model (Llama 3 70B or Azure OpenAI HIPAA), keep ClickHouse self-hosted, and we have a deployable architecture for healthcare partners. The Pydantic contracts mean none of the agent code changes. A clinician handoff layer. Surface the ClickHouse trend data and Datadog trace history as a read-only clinician portal — so when MindMesh escalates to professional referral, the professional walks into the conversation already understanding the pattern. Mobile-first capture. Pair the existing dashboard with a thin React Native app that captures the same behavioral signals from the device keyboard.