Inspiration

Every LLM API call costs money and emits carbon, but most of those calls don't need the largest available model. A one-sentence email summary doesn't need a frontier-grade model — whether that's GPT-5.5, Claude Opus 4.7, Gemini, or anything else at the top of the leaderboard. Yet developers rarely think about this trade-off per call, because the routing logic is annoying to write — and the savings only land if the routing is automatic and provider-agnostic.

We wanted the lowest-friction way for any existing OpenAI client to start getting smart routing — and to see the carbon savings as they accumulate.

Joule's slogan — "Charge AI for its own electricity bill." — captures the core idea: the model that runs should be billed (in CO₂ and dollars) for what it actually consumed, and the user should see that bill in human terms.

What it does

Joule is a Carbon-aware AI Gateway. It exposes an OpenAI-compatible /v1/chat/completions endpoint on localhost:3001. Any existing OpenAI client integrates by changing one line — the base_url.

For every request, Joule:

  1. Classifies the user's intent via Nemotron Nano (~10ms): summarize, code, reasoning, etc.
  2. Routes via the DecisionLayer to the smallest sufficient model — summarize → Nemotron Nano, everything else → Nemotron Super.
  3. Calls Crusoe Managed Inference.
  4. Measures carbon and cost Defensively — if Crusoe sends an X-Carbon-grams response header, that value is used; otherwise a static per-model lookup table. The source label is always preserved.
  5. Writes the call log to SQLite (WAL mode) and returns an OpenAI-compatible response.

The Dashboard (Next.js, localhost:3000) shows cumulative carbon, cost, and the Super/Nano mix updated in real time.

The Hermes Agent is a natural-language interface over the same call log. The user types "How much did we save this week?" or "What are the top 3 most expensive calls?". A 3-step agent loop runs: Planner (Super) decides which of 5 tools to call → Executor runs it (SQLite read) → Responder (Super) summarizes the result in English.

Crucially, Hermes's own LLM calls go through Joule's gateway, so the agent measures its own carbon footprint as it works — a small self-reference loop.

How we built it

The project is a TypeScript monorepo with three independent processes sharing a SQLite database:

  • Joule core (Hono on Node 20) — Gateway, Routing, Inference adapter, Carbon meter, Storage.
  • Dashboard (Next.js 14 app router + recharts) — reads joule.db directly via better-sqlite3.
  • Hermes Agent — CLI binary and dashboard chat UI, both backed by the same agent loop.

We followed test-driven development throughout: 48 unit tests across 9 files, plus 6 live verify-shot bash scripts that exercise each demo cut end-to-end against real Crusoe Nemotron calls.

Specific technical decisions:

  • Model IDs (nano-30b-a3b, super-120b-a12b) are Joule-internal; the Real adapter translates them to Crusoe's catalog IDs only at the HTTP boundary.
  • The intent classifier combines a fast keyword pre-filter with an LLM fallback. The pre-filter catches obvious cases for demo reliability; the LLM handles ambiguous ones.
  • The Hermes Responder runs on Super (not Nano), because Nano occasionally returned empty content on larger tool JSON inputs during testing. The trade-off (+3-5 s per chat) is worth it for demo correctness.
  • Carbon measurement is Defensive: we don't pretend to know the carbon if Crusoe doesn't tell us, but we always have a labeled best-effort estimate. The label (source: "static" vs "header") is visible in the dashboard.

Challenges we ran into

  • Crusoe model ID mismatch. Our internal IDs didn't match Crusoe's catalog. Initial live calls returned 404. Fixed at the HTTP boundary with a translation map in the Real adapter.
  • Hermes Responder instability on Nano. Larger JSON tool results sometimes produced empty content from the Nano model. Switched to Super for the Responder step.
  • Next.js bundling better-sqlite3. Native module + Next.js webpack didn't get along — __filename was rewritten to undefined and crashed on the bindings package. Resolved via next.config.mjs webpack externals + inlining the schema SQL.
  • Windows process management. pkill doesn't exist; replaced with taskkill /F /IM node.exe. PowerShell's bash was being resolved to a non-installed WSL bash. Added a profile alias to Git Bash.
  • Demo video text legibility. The first version of the verify-shot scripts only printed JSON. We added an English narrative block (cut title + step-level OK lines + PASS summary) so anyone watching the recorded terminal can follow what each shot proves.

Accomplishments that we're proud of

  • A real OpenAI-compatible gateway integrated in one line of client code.
  • AutoModelSelection works against real Crusoe Nemotron in production — not a mock.
  • The Defensive carbon measurement label (source: static | header) is a small idea but, we think, worth standardizing across carbon-aware tooling.
  • Hermes routing its own LLM calls through Joule is the agent equivalent of dogfooding — the carbon meter applies to its own decisions.
  • Six days, solo, with TDD. 48 unit tests + 6 live verify-shot scripts, all green.

What we learned

  • Defensive carbon measurement (label the source, never guess silently) is a better contract than trying to estimate everything.
  • A small keyword pre-filter + LLM fallback is a more reliable production pattern than pure LLM intent classification.
  • The biggest demo risk in a 6-day hackathon is "the day-of recording" — pre-built verify scripts with self-explanatory output saved at least an hour.
  • Self-reference (agent measures its own carbon) is a useful framing for any carbon-aware infrastructure layer.

What's next for Joule

  • Time-of-day routing — queue non-urgent calls into low-carbon hours of the grid.
  • Multi-region carbon — pick the lowest-carbon Crusoe region per call.
  • Streaming responses — we currently proxy non-streaming completions only.
  • Hermes autonomous cron — weekly report sent Sunday 9am via Gmail SMTP (today: manual trigger).
  • Standardize X-Carbon-grams — publish a small RFC for the response header so other gateways can adopt it.
  • Open-source the conversion table — build a community-maintained per-model carbon estimate dataset.

Built With

Share this project:

Updates