Inspiration

We kept seeing tiny prompts generate surprisingly large bills. Digging in, we realized the real cost was dominated by invisible agent scaffolding, tool schemas, and environment state resent on every call. At the same time, we were seeing the real‑time impacts of scaling AI in local communities, so we wanted to make those emissions visible and give everyday users a way to stay conscious of their usage and footprint without rewriting tools.

What it does

TokenTrace is a local proxy and dashboard that makes AI cost and emissions visible, then reduces them. It tracks token usage and converts it into energy and CO₂, and the Impact tab turns that into real sustainability metrics: total footprint, real‑world equivalents, usage patterns, emissions by model, switch‑and‑save suggestions, compression impact, and practical tips to reduce waste. It runs quietly in the background and auto‑detects usage across Claude Code, Codex, Gemini, plus ChatGPT and claude.ai in the browser, so everything flows into one pipeline with no per‑tool setup.

How we built it

We built a provider‑agnostic middle layer that sits between AI tools and their APIs. It works with many models, and today we ship integrations for Gemini CLI, Claude Code, and Codex, plus a browser extension for ChatGPT and claude.ai. The proxy intercepts traffic, measures token throughput, calculates emissions, and forwards everything transparently. When a prompt is long enough, it is summarized before forwarding to cut tokens. The key sustainability win is that this also applies when the AI spins up helper agents, so each handoff is compressed too, leading to compounding emissions savings across the whole workflow.

Challenges we ran into

Different providers behave differently, and some features that work in one place break in another. We also had to make sure everything stayed fast and accurate while still capturing the data and applying compression.

Accomplishments that we're proud of

We exposed the hidden token overhead in agentic stacks and quantified how big it really is. We also observed recursive compression where an agent's instructions to subagents were automatically rewritten into tighter formats, cutting costs through the entire chain, potentially exponentially decreasing energy usage.

What we learned

Stateless APIs create compounding overhead in long sessions. Compression only helps when the net benefit is positive, so you must account for extra calls, latency, and any quality risk. Making savings automatic is more important than having a clever compressor.

What's next for TokenTrace

Run rigorous benchmarks across tasks and models, build a cost aware policy that decides when to compress, and ship integrations that make savings automatic for teams while preserving reliability.

Built With

Share this project:

Updates