-
-
marmalade landing page
-
marmalade post-session page
-
Deterministic safety boundaries enforced outside of model prompting.
-
Server side orchestration showing Mini and Counselor agents and their execution order.
-
gather recent turns and session state, generate a strict JSON SOAP-style report via Vertex
-
High level system flow from client to server, agent orchestration, persistence, and reporting.
Marmalade: Agentic Emotional Continuity via Speculative Execution
Real-time companionship that compiles lived experience into clinical SOAP records.
Inspiration
Before software, I worked on a factory floor.
The noise was constant. Procedures were rigid. The isolation was quiet, but heavy. Most days were not crises. They were long stretches where nothing was wrong enough to justify speaking, yet carrying everything alone slowly accumulated weight.
What was missing was not advice or solutions. It was continuity. A witness that could listen without urgency and remember without distortion.
Marmalade exists to bridge isolation and being known.
What Marmalade Is
Marmalade is an agentic emotional companion built to preserve narrative continuity across time.
Unlike conventional chat systems that reset context every session, Marmalade tracks emotional trajectories, recurring states, and personal anchors across weeks and months. It does not attempt to fix emotions or prescribe outcomes. Its role is presence and memory.
Presence over prescription. Memory over immediacy.
End-to-End Flow

System Architecture
Marmalade deliberately separates speed and depth.

Real-Time Pipeline (Gemini 2.5 Flash)
The real-time pipeline is optimized for voice responsiveness using parallel speculative execution.
MiniBrain
- Sub-second turn analysis
- Classifies mood, state deltas, and risk level (0–4)
- Acts as a gated orchestrator determining whether deep reasoning is required
- Reduces high-performance LLM compute usage by an average of 40–60%, dropping conversational latency to <800ms while maintaining safety guardrails.
FirstResponse
- Immediate acknowledgement stream
- Starts before full reasoning completes
- Prevents perceived silence
Counselor
- Produces the substantive response
- Begins context hydration and RAG retrieval speculatively while MiniBrain validates safety
- Flushed to the stream only when required
Latency never blocks correctness.
Reflective Pipeline (Gemini 2.5 Pro)
After a session ends, priority shifts from speed to reasoning depth.

The full conversation is synthesized into a professional-grade S.O.A.P. record:
- Subjective
- Objective
- Assessment
- Plan
This is not a summary. Marmalade automates a core administrative burden of therapy by transforming raw conversation into a structured longitudinal clinical record. It bridges daily lived experience with future professional intervention, allowing therapists to focus on interpretation and change rather than reconstruction.
Each reflection becomes a long-term memory anchor, enabling continuity across weeks and months rather than episodic recall.
PDF Report Example: Click Here
ElevenLabs Integration
We bypassed the standard constraints. By exposing a custom OpenAI-compatible endpoint, we forced the ElevenLabs Agent to use Marmalade's Brain (Gemini + Memory) instead of a generic LLM. This combines ElevenLabs' world-class audio latency with our custom clinical safety architecture.
Capabilities:
- Streaming and non-streaming responses
- Persistent session mapping via
x-user-idandx-session-id - Full server-side orchestration, memory, and safety retained
Audio streams flow directly between client and ElevenLabs via WebRTC. Marmalade maintains the stateful cognitive layer.
Deterministic Coordination
Before any large model is invoked, Marmalade decides what kind of response is appropriate.
The Turn Coordinator computes:
- Response class (understanding, reflection, anchoring, grounding)
- Grounding eligibility
- Language plan (sentence length, directness, metaphor density)
Models receive instructions, not autonomy.

Persistence and Memory
Each session updates:
- Conversation state
- Risk logs
- Voice session metrics
- Message history
- Embedded memory documents
At session end, a summarized memory document is indexed for future retrieval. Continuity compounds. Nothing resets.
Technical Challenges
Building an agentic system like Marmalade exposed failure modes absent in conventional assistants. The primary difficulty was coordinating latency, safety, memory, and determinism under real-time constraints.
1. The Safety–Latency Paradox
Mental health systems require risk evaluation on every turn. A naïve secondary LLM call for safety adds 500–1200 ms of latency, breaking real-time voice interaction.
Marmalade resolves this via parallel speculative execution. MiniBrain performs high-speed risk classification while the Counselor pipeline hydrates context and prepares a response. If MiniBrain detects crisis (r ≥ 4), generative output is terminated before emission and replaced with a deterministic safety response. No unsafe token is ever spoken.
Safety is enforced structurally, not rhetorically.
2. Narrative Continuity and Memory Drift
Standard RAG retrieves episodic fragments, leading to temporal confusion and contradictory recall over time.
Marmalade replaces episodic recall with longitudinal state mapping. Each session is synthesized into a structured S.O.A.P. record through the Reflective Pipeline. These records function as high-density anchors representing emotional trajectory across weeks and months, not just recent messages.
Continuity is cumulative, not fragile.
3. WebRTC Orchestration with Stateful Cognition
ElevenLabs optimizes for audio transport, not cognition. Default integrations treat the model as a black box, preventing insertion of safety, memory, and language-planning logic without lag.
Marmalade introduces a custom orchestration layer via an OpenAI-compatible endpoint. ElevenLabs handles WebRTC transport while Marmalade governs turn coordination, memory retrieval, safety gating, and language planning. Voice output always reflects backend-determined intent.
Audio is real-time. Cognition remains authoritative.
4. Deterministic Output Reliability
LLMs are stochastic. In therapeutic preparation, malformed structures or hallucinated output can trigger anxiety or corrupt memory.
All outputs are schema-validated before emission. If validation fails, Marmalade emits a deterministic Emergency Packet: conservative, neutral, and structurally sound. No unvalidated output is persisted or spoken.
Reliability is enforced at the system boundary.
The Vision: Ending Narrative Isolation
Marmalade is an agentic support layer for narrative continuity.
It does not replace the therapist. It prepares the patient. By automating the Subjective and Objective recording of lived experience, Marmalade allows human connection to focus on interpretation, meaning, and change.
Marmalade remembers, so you do not have to carry the narrative alone.
What This System Proves
Empathy is an engineering problem.
It requires state management across time, latency discipline, deterministic safety, and contextual restraint.
Marmalade is not designed to fix people.
It is designed to remain present, coherent, and remembering until movement becomes possible again.
Built With
- docker
- elevenlabs
- gcp
- gemini
- hono
- postgresql
- pulumi
- rag
- react
- vertex
Log in or sign up for Devpost to join the conversation.