Inspiration

Every day, people make deals with their voices. A freelancer agrees to build a website for £500 over a phone call. A plumber quotes a price at your front door. A friend says they'll cover your half and you'll pay them back.

These agreements work until they don't. The freelancer delivers the site and the client ghosts. Physical work makes this worse: once the pipes are fixed, you can't un-fix them. The plumber either absorbs the loss or spends more chasing payment than the job was worth.

This isn't a trust problem, as most people intend to pay. It's an infrastructure problem. Verbal agreements have no record, no escrow, no enforcement. Trust has to do all the work, and trust doesn't scale.

I built Handshake to give verbal agreements the same infrastructure that written contracts have. Without it, trust has to do all the work, and trust doesn't scale.

What it does

Two people open Handshake in their browsers, join a room, and talk. They negotiate a price, agree on work, split a bill. Their conversation is transcribed in real time.

Each person has an AI agent configured with their priorities, context, and financial boundaries. When the conversation reaches an agreement, both agents activate. They independently analyse the transcript, extract terms, and negotiate autonomously on behalf of their users for up to five rounds. The result is a structured legal document that reflects what was said, weighted toward each person's stated needs.

Both users review and sign. Stripe executes the payment. If the work hasn't been delivered yet, funds are held in escrow and released on completion, with partial capture for jobs where the final cost differs from the estimate.

The entire interaction happens through voice. No forms, no invoices, no switching apps.

How we built it

How we built it

Handshake runs on a single Node.js server deployed to Railway. The frontend is a vanilla single-page app with a four-panel dashboard that shows the live transcript, agent activity, document generation, and payment status.

Audio flows through three WebSocket connections per user. The browser captures microphone input and streams binary PCM audio to the server. The server relays it to the other user for live playback and simultaneously feeds it to ElevenLabs Scribe v2 for real-time transcription.

Agreement detection uses a dual trigger system. Users can opt for a shared activation word that both parties must say within 30 seconds, providing explicit mutual consent. Alternatively, an LLM monitors the transcript and detects financial agreement language automatically.

Once triggered, each user's AI agent (Claude via OpenRouter) independently analyses the transcript and builds a structured proposal. The two agents then negotiate autonomously, proposing, countering, accepting, or rejecting terms across up to five rounds. Each agent follows its user's configured preferences: negotiation style, maximum amounts, and escrow rules.

When the agents reach agreement, an LLM generates a legal document with terms, payment schedule, conditions, and dispute resolution. Both users sign in-browser. Stripe Connect executes the payment, using manual-capture PaymentIntents for escrow where funds are held until work is confirmed complete.

What's next for handshake

Right now, both users need to open the app in a browser. The next step is moving Handshake to work over phone calls using WebRTC, so any two people can activate it during a normal conversation. Combined with identity verification and Stripe ID exchange, this removes the last piece of friction: you wouldn't need to be on the same platform, or even know each other beforehand. Two strangers on a phone call could reach an agreement, have their agents formalise it, and execute payment without either person touching a screen.

Built With

Share this project:

Updates