ZeroCall

logo

Inspiration

I have a highly integrated personal productivity stack - my Claude is integrated with Notion, Apple and Google Calendars, Gmail, Slack, etc. I can ask Claude things like "what are my action items for my lab?" and it can fetch lab AI meeting notes from Notion, Slack DMs from my PI, lab emails, and synthesize all of the information. However, the processes of crawling across sources gathering info takes up to 3-5 minutes, and is repeated regardless of whether the same context was just used in another chat.

This is a huge bottleneck in the age of agents: for enterprises with a high volume of agent queries (think: custom agents that scrape slack and wikis to consolidate information, assign tasks, schedule meetings etc), high latency and token usage will compound.

Before	After

The core problem is that agents are stateless. It's not slow because it's dumb, it's slow because it has amnesia. Tool calls are token- and latency-expensive; each tool call is another LLM turn, and requires waiting on APIs back and forth. For the common use case of having common/automated workflows that poll the same sources for the same context over and over, wouldn't it be a lot faster to be caching the relevant information so the response can be a single conversation turn?

This is what ZeroCall does. Rather than exposing context through tools that the model must decide to call, ZeroCall intercepts every outgoing request and injects a pre-synced snapshot of your work state (calendar, inbox, tasks, Slack) directly into the system prompt before Claude's first token. No tool calls. No back-and-forth with APIs. The context is already there.

What it does

ZeroCall is an adaptive agent prompt harness that eliminates tool-call overhead for repeated productivity workflows - but more fundamentally, it represents a paradigm shift in how agents are built.

System prompts have always been static. ZeroCall makes them dynamic. Rather than exposing context through tools that the model must decide to call, ZeroCall syncs your calendar, inbox, and tasks in the background and injects a live snapshot of your work state directly into every request, before the model sees a single token. The agent doesn't fetch context. The context arrives with the agent.

The result:

elimination of tool calls
~60% fewer LLM turns
~60% faster response time
~75% fewer tokens used

It's important to note that ZeroCall is agent middleware - a drop-in utility around the Anthropic SDK that enterprises integrate directly into their productivity agents. The end user would not interact with it using our dashboard, which is simply for demoing and visualizing the impact that ZeroCall has. In practice, enterprises would integrate it into their productivity agents to cut time and token costs dramatically.

demo

ZeroCall also learns your workflow over time. It tracks which sections of context you actually ask about, and surfaces suggestions to disable the ones you don't, with projected token savings. One click applies the config. The system prompt adapts to you. Same answers, leaner prompt.

How we built it

ZeroCall is built as a TypeScript monorepo using npm workspaces, divided into three distinct components:

The Harness

Zero-friction integration: Implementing the harness is a one-line swap: new Anthropic() → new ZeroCallAnthropic().
Automatic context injection: Subclasses the Anthropic SDK to override the prepareOptions() lifecycle hook.
Seamless context delivery: Populates options.body.system with a WorkStateSnapshot (calendar, email, Notion tasks) before the API request fires, providing the model with live context without changing existing agent logic.

The Server

Resilient data syncing: Runs a 15-minute node-cron polling loop for Gmail, Google Calendar, and Notion using Promise.allSettled so individual provider outages don't block the system.
Optimized caching: Distills fetched data into a typed snapshot saved to SQLite, using ensureFreshSnapshot() for lazy in-memory caching to eliminate cold starts.
Adaptive token management: Classifies query types, tracks context relevance over the last 100 queries, and dynamically toggles context sections to reduce token usage.

The Dashboard

Architecture: A Vite + React frontend served by an Express backend, designed for visibility rather than end-user interaction.
Configuration: Manages Google OAuth and initial credential setup.
Live observability: Features side-by-side live tracing (ZeroCall vs. vanilla Claude API) to visualize real-time improvements in latency and token consumption.

Evolution & Challenges

The initial MCP approach: Started as a single tool that dumped the entire state to Claude. While it reduced latency by ~25%, it sometimes increased token usage due to the overhead of the model calling the tool and parsing the dump.
The harness injection pivot: Decided against building another MCP tool. Instead, intercepted the request to inject context directly into the system prompt, breaking the tool-call loop entirely and dramatically reducing latency and token consumption.
The over-injection problem: Pre-injecting everything proved wasteful for specialized workflows. Users who only needed specific data (like tasks) were still paying tokens for irrelevant context (like emails).
The adaptive solution: Built a learning layer that observes user query patterns over time and calculates per-section relevance. The system automatically suggests disabling unused sections, dynamically optimizing the context payload based on actual user behavior.

(shoutout to the cognition guys for invaluable advice throughout our process!)

Impact

The metrics of running common productivity agent prompts are undeniable - on queries like "do I have any urgent emails to respond to?" ZeroCall has improvements of about 75% token reduction, 60% latency reduction.

This might seem like a per-prompt curiosity, but consider the scale at which productivity agents operate in an enterprise.

$$\underbrace{5{,}000}_{\text{tokens saved/query}} \times \underbrace{20}_{\text{queries/day}} \times \underbrace{250}_{\text{workdays/yr}} = 25{,}000{,}000 \text{ tokens saved per employee per year}$$

$$2.5 \times 10^7 \text{ tokens} \times \frac{$3}{10^6 \text{ tokens}} = $75 \text{ saved per employee per year}$$

$$2{,}000 \text{ employees} \times $75 = \textbf{\$150,000/year}$$

And that's a conservative estimate. Agentic pipelines that run automated workflows - hourly Slack summarizers, task assignment bots, standup generators - can fire hundreds of queries per day with no human in the loop. At that volume, the savings scale faster than headcount. (Not even mentioning the halved latency!)

What we learned

While building ZeroCall, we learned a lot about the outsized effect of context-rich system prompting in agentic workflows, as well as the mechanics of agentic tool calls and MCP. We also learned new agentic development workflows, in particular Cognition's Devin SWE agent.

The most surprising insight was how much the position of context injection matters. Exposing the same data as a tool versus injecting it into the system prompt produced dramatically different results - not just in latency, but in how the model reasoned. Pre-injected context is treated as ground truth; tool-retrieved context is treated as something it had to go look up.

Designing the adaptive layer also taught us that "relevance scoring" is deceptively hard. Broad queries like "what should I focus on?" touch every section; naively including them in the denominator would suppress all suggestions. Small design decisions like that had outsized effects on whether the system was actually useful.

Most importantly, ZeroCall clarified that the next frontier in agent performance isn't smarter models, it's smarter infrastructure around them. The model was never the bottleneck. The scaffolding was.

What's next for ZeroCall

Expanding Data Sources

Broader integrations: The injection pattern naturally extends to polling Slack (unread DMs/summaries), Jira/Linear (sprint states), GitHub (PRs), and Confluence.
Drop-in architecture: The existing TaskProvider interface is designed to make new sources seamless additions.
MCP registry vision: Aiming to build a registry of snapshotable MCPs where agents simply declare their pre-injection needs and ZeroCall handles the rest.

Smarter Adaptive Context

Advanced classification: Replacing basic keyword heuristics with a lightweight embedding-based classifier for highly accurate query categorization.
Time-of-day awareness: Dynamically weighting context sections based on time (e.g., favoring calendars in the morning and tasks in the afternoon).

Standard Middleware Positioning

NPM distribution: Publishing @zerocall/harness to npm so any team can cut tool-call overhead with a single line of code.
Cross-provider support: Expanding the prepareOptions() lifecycle hook pattern to the OpenAI and Gemini SDKs for provider-agnostic injection.
The ultimate goal: Making dynamic, pre-injected system prompts the industry default rather than the exception.