Inspiration
Every LLM API call costs money and emits carbon, but most of those calls don't need the largest available model. A one-sentence email summary doesn't need a frontier-grade model — whether that's GPT-5.5, Claude Opus 4.7, Gemini, or anything else at the top of the leaderboard. Yet developers rarely think about this trade-off per call, because the routing logic is annoying to write — and the savings only land if the routing is automatic and provider-agnostic.
We wanted the lowest-friction way for any existing OpenAI client to start getting smart routing — and to see the carbon savings as they accumulate.
Joule's slogan — "Charge AI for its own electricity bill." — captures the core idea: the model that runs should be billed (in CO₂ and dollars) for what it actually consumed, and the user should see that bill in human terms.
What it does
Joule is a Carbon-aware AI Gateway. It exposes an OpenAI-compatible /v1/chat/completions endpoint on localhost:3001. Any existing OpenAI client integrates by changing one line — the base_url.
For every request, Joule:
- Classifies the user's intent via Nemotron Nano (~10ms):
summarize,code,reasoning, etc. - Routes via the DecisionLayer to the smallest sufficient model —
summarize → Nemotron Nano, everything else →Nemotron Super. - Calls Crusoe Managed Inference.
- Measures carbon and cost Defensively — if Crusoe sends an
X-Carbon-gramsresponse header, that value is used; otherwise a static per-model lookup table. The source label is always preserved. - Writes the call log to SQLite (WAL mode) and returns an OpenAI-compatible response.
The Dashboard (Next.js, localhost:3000) shows cumulative carbon, cost, and the Super/Nano mix updated in real time.
The Hermes Agent is a natural-language interface over the same call log. The user types "How much did we save this week?" or "What are the top 3 most expensive calls?". A 3-step agent loop runs: Planner (Super) decides which of 5 tools to call → Executor runs it (SQLite read) → Responder (Super) summarizes the result in English.
Crucially, Hermes's own LLM calls go through Joule's gateway, so the agent measures its own carbon footprint as it works — a small self-reference loop.
How we built it
The project is a TypeScript monorepo with three independent processes sharing a SQLite database:
- Joule core (Hono on Node 20) — Gateway, Routing, Inference adapter, Carbon meter, Storage.
- Dashboard (Next.js 14 app router + recharts) — reads
joule.dbdirectly viabetter-sqlite3. - Hermes Agent — CLI binary and dashboard chat UI, both backed by the same agent loop.
We followed test-driven development throughout: 48 unit tests across 9 files, plus 6 live verify-shot bash scripts that exercise each demo cut end-to-end against real Crusoe Nemotron calls.
Specific technical decisions:
- Model IDs (
nano-30b-a3b,super-120b-a12b) are Joule-internal; the Real adapter translates them to Crusoe's catalog IDs only at the HTTP boundary. - The intent classifier combines a fast keyword pre-filter with an LLM fallback. The pre-filter catches obvious cases for demo reliability; the LLM handles ambiguous ones.
- The Hermes Responder runs on Super (not Nano), because Nano occasionally returned empty content on larger tool JSON inputs during testing. The trade-off (+3-5 s per chat) is worth it for demo correctness.
- Carbon measurement is Defensive: we don't pretend to know the carbon if Crusoe doesn't tell us, but we always have a labeled best-effort estimate. The label (
source: "static"vs"header") is visible in the dashboard.
Challenges we ran into
- Crusoe model ID mismatch. Our internal IDs didn't match Crusoe's catalog. Initial live calls returned 404. Fixed at the HTTP boundary with a translation map in the Real adapter.
- Hermes Responder instability on Nano. Larger JSON tool results sometimes produced empty content from the Nano model. Switched to Super for the Responder step.
- Next.js bundling
better-sqlite3. Native module + Next.js webpack didn't get along —__filenamewas rewritten toundefinedand crashed on the bindings package. Resolved vianext.config.mjswebpack externals + inlining the schema SQL. - Windows process management.
pkilldoesn't exist; replaced withtaskkill /F /IM node.exe. PowerShell'sbashwas being resolved to a non-installed WSL bash. Added a profile alias to Git Bash. - Demo video text legibility. The first version of the verify-shot scripts only printed JSON. We added an English narrative block (cut title + step-level OK lines + PASS summary) so anyone watching the recorded terminal can follow what each shot proves.
Accomplishments that we're proud of
- A real OpenAI-compatible gateway integrated in one line of client code.
- AutoModelSelection works against real Crusoe Nemotron in production — not a mock.
- The Defensive carbon measurement label (
source: static | header) is a small idea but, we think, worth standardizing across carbon-aware tooling. - Hermes routing its own LLM calls through Joule is the agent equivalent of dogfooding — the carbon meter applies to its own decisions.
- Six days, solo, with TDD. 48 unit tests + 6 live verify-shot scripts, all green.
What we learned
- Defensive carbon measurement (label the source, never guess silently) is a better contract than trying to estimate everything.
- A small keyword pre-filter + LLM fallback is a more reliable production pattern than pure LLM intent classification.
- The biggest demo risk in a 6-day hackathon is "the day-of recording" — pre-built verify scripts with self-explanatory output saved at least an hour.
- Self-reference (agent measures its own carbon) is a useful framing for any carbon-aware infrastructure layer.
What's next for Joule
- Time-of-day routing — queue non-urgent calls into low-carbon hours of the grid.
- Multi-region carbon — pick the lowest-carbon Crusoe region per call.
- Streaming responses — we currently proxy non-streaming completions only.
- Hermes autonomous cron — weekly report sent Sunday 9am via Gmail SMTP (today: manual trigger).
- Standardize
X-Carbon-grams— publish a small RFC for the response header so other gateways can adopt it. - Open-source the conversion table — build a community-maintained per-model carbon estimate dataset.
Built With
- ai-agent
- better-sqlite3
- crusoe
- hackathon
- hono
- llm
- nemotron
- next.js
- node.js
- openai
- python
- react
- recharts
- sqlite
- typescript
- undici
- vitest
Log in or sign up for Devpost to join the conversation.