Oath | Devpost

Inspiration

The Solana Foundation thinks 99.99% of on-chain transactions will be agent-driven in two years. That number kept me up at night. Right now, the AI agents we're handing our money, calendars, and APIs to have basically no accountability. One prompt injection turns your agent into a wire-transfer bot, and the industry's answer to this seems to be... more strings in the system prompt.

I don't think that's going to hold.

Oaths, signatures, slashing. Those are the tools humans built over centuries to keep powerful agents in line. Oath is me trying to port all three to AI.

What it does

Oath lets you bind an AI agent to a cryptographic commitment before it can act.

You give it a task, something like "book dinner for 4 in Austin under $200." Gemini drafts the OathProposal — exact purpose, spend cap, per-tx cap, whitelisted recipients, allowed action types, expiry. The agent deposits SOL as stake. You sign once in Phantom. It's now bound on-chain.

After that, every agent action goes through record_action on the Solana program. Anything out of scope reverts at the program level. The backend catches the revert, signs an oracle attestation, and calls slash. Stake moves to your wallet. Solana Explorer has the whole audit trail.

The demo covers three outcomes. Happy path: agent books within scope, stake returns. Attack: someone jailbreaks the agent into trying to send funds to a random wallet, the scope check reverts, stake gets slashed. Revoke: you end the oath early, stake goes back to the agent, any later agent action fails with OathNotActive.

How we built it

Solana program is Anchor / Rust. Six instructions: create_oath, record_action, slash, revoke_oath, fulfill_oath, expire_oath. Fifteen typed errors. Ed25519-precompile oracle attestations, PDA-signed stake vaults. Fourteen tests. Live on devnet at 2Uvqbnt6kiaB7Y3AHhtS2FLWRFrJweRebtErSQE2kPmy.

Backend is Next.js 14 App Router. Gemini 2.5-flash drafts the proposal with zod-validated structured output and one retry when it inevitably returns malformed JSON. ElevenLabs handles the voiceover. Once the oath is bound, a small runtime loop picks the next action, calls record_action, and auto-triggers slash if the program reverts.

Frontend is black, white, and one aquamarine accent that I kept trying to use more of and kept deleting. Bodoni Moda for display, Manrope for UI, JetBrains Mono for hashes. The centerpiece is a real WebGL scene: a slab inside a halo. When the oath gets slashed, the slab fractures. Took three attempts to make it not read as abstract art.

The best architectural choice I made was one I didn't plan for. I'd originally built a whole MongoDB layer to mirror oaths and actions. Halfway through, I realized program.account.oath.all() IS the dashboard. I deleted the mirror. Nothing broke. If the backend goes down, the data is still on Solana. I wish I'd started there.

Deployed to Vercel. Agent and oracle keypairs are base64 env vars, not files. Custom domain through Porkbun.

Challenges we ran into

Toolchain hell. Anchor 0.30.x, which the spec asked for, doesn't compile against stable Rust 1.91 + Agave 3.1. proc_macro2::Span::source_file was removed from stable rustc. I burned an hour on pinning combinations before giving up and moving to 0.31.1.

Gemini 2.0-flash-exp got retired mid-hackathon. 404 on the model ID. Moved to 2.5-flash, which is verbose enough that the 2048-token output cap was truncating proposals mid-JSON. Bumped to 4096 and added a retry that feeds the parse error back into the prompt.

Vercel almost shipped broken because of a one-byte base64 corruption. Position 62 of the agent keypair came out as 152 instead of 156. Traced it to a shell paste buffer. Fix was to stop using the base64 binary and encode inside node instead. That one cost me an embarrassing amount of time.

The 3D scene was way harder than I expected. First version was a monolith inside a halo, floating. Looked cool, meant nothing. I had to bolt on semantic labels (Ring = scope boundary, Slab = signed oath, Fracture = slash event), state-driven copy that changes as the oath progresses, and real layout discipline so text wasn't overlapping the 3D. It reads as an oath now. I'm still not sure it's the right centerpiece.

Accomplishments that we're proud of

14/14 Anchor tests pass, three runs in a row, no flakes. Every violation case I could think of is covered: wrong recipient, wrong action type, per-tx overflow, cumulative overflow, post-expiry, post-revocation, bad oracle proof.

The slash is real. Open Solana Explorer, paste the slash tx from the demo, and you'll see the lamports actually move from the vault to the user. Not a simulation.

npm run demo:smoke runs all three scenes end-to-end against devnet in about ten seconds. I ran it enough times that I noticed the agent was slowly draining SOL per slash, so the script auto-rebalances now. Stupid little fix, took me a while to think of it.

The aesthetic landed on monochrome with almost no color. I tried three versions (one cinematic, one gallery-style, one editorial), and the plainest one was the best one. I didn't see that coming.

Solo build. Program, backend, agent runtime, frontend, 3D, deploy pipeline. Roughly 24 hours, not counting the parts I slept through.

What we learned

Correctness-by-construction hits different than runtime validation. When OathStatus is an enum and every cap is a checked-math operation, a lot of bug classes just can't happen. The compiler refuses. I'd read about this forever, but actually writing the program was the first time it clicked.

For a protocol like this, on-chain state really can replace a database. The mirror I built early wasn't wrong. It just wasn't necessary, and it made the product feel more complicated than it is.

Generative models need retry contracts, not hope. Gemini will hand you truncated JSON occasionally. You either plan for it or your live demo dies in front of everyone.

Slashing is a weirdly underrated primitive. Validators slash each other. Fiduciaries get fined. AI agents need the same thing but cryptographic, and the Solana runtime gives you that in roughly 200 lines of Rust.

Restraint is a brand. I kept adding visual flourishes and kept deleting them, and the thing got better every time I deleted something.

What's next for Oath

Mainnet, with USDC and SPL token support inside record_action. Obvious.

Real x402 payment integration instead of the mock. Every paid API call an agent makes should flow through the oath.

Live dashboard updates via program logs WebSocket, so the UI reflects on-chain state the moment record_action lands.

Agent identity that isn't just a raw pubkey. A registry with reputation, operator attestations, slash history. Closer to a credit score for AI agents.

Composable oaths. Oath A can whitelist oath B as a recipient. Delegation chains, all enforced by the same primitive. I think this is the idea I'm most excited about.

Economic stuff I haven't figured out yet: what's the right stake-to-cap ratio? How should reputation decay? Is there a market for insurance on slashed stake? Mostly open questions.

And the unsexy infra: Backpack, Solflare, Phantom deep links on mobile, a real oracle network instead of the one-key setup I have right now.