Inspiration
Granola, Fireflies, every meeting AI today is a surveillance product wearing the costume of a productivity tool. Every word of every meeting goes to someone else's server. We wanted to see if you could build the same product without ever giving up the data.
What it does
Jarvis is a peer-to-peer meeting room with a built-in voice AI. Two laptops and a phone join the same room with a QR code, talk to each other live, and get a shared transcript that lives only on the devices that were actually in the call. Say "hey jarvis" and the AI joins the conversation, answers out loud, and remembers context just like a human peer would.
How we built it
Every peer runs the same Pear desktop app. Joining a room means joining an Autobase, a multiwriter encrypted append-only log on top of Hypercore, over a Hyperswarm topic on HyperDHT. There is no server anywhere. We built a tiny framework called Pear Bots on top of this, basically defineBot({ shouldRespond, respond }), and Jarvis itself is the reference implementation. The AI joins the same Autobase as another writer, sees every utterance, decides when to speak, and writes its replies back into the same log so every peer sees the same conversation.
For voice we run two parallel Hyperswarms. The Autobase swarm handles structured data like transcripts and presence. A separate audio swarm carries low-latency walkie-talkie streams that do not need to be persisted. Phones come in through a Cloudflare tunnel pointed at whichever desktop is hosting a local web bridge, and the bridge URL gets gossiped over the audio swarm so phones automatically failover if the host they are tunneled to crashes. STT runs in-browser, the brain is Claude with a Gemma fallback, and TTS is ElevenLabs.
Challenges I ran into
The hardest one was getting walkie-talkie audio to actually play on iPhone receivers. We started with MediaSource Extensions for low-latency streaming, which works on desktop and Android, but iOS Safari silently rejects audio/webm;codecs=opus inside MSE. The SourceBuffer accepts the data, then the audio element surfaces a decode error and dies for the rest of the session. We rebuilt the receiver around per-transmission Web Audio decoding instead. Each press is tagged with a custom 8-byte header carrying a monotonic transmission id, the receiver buffers chunks per id, and on an explicit end-of-transmission signal it concatenates them into one blob and decodes through AudioContext.decodeAudioData. iOS routes that through AVFoundation, which actually does decode opus, and the latency cost is around 150ms per press.
The other big one was dealing with Autobase being eventually consistent across writers. Multiple peers writing at the same time means ordering can shift retroactively, so anything inside apply has to be deterministic. No Date.now(), no random ids, no external reads, otherwise the view diverges between peers and the whole "shared transcript" claim falls apart.
Accomplishments that I'm proud of
Pulling the ethernet cable mid-call and watching the room keep working, then plugging back in and seeing every utterance reconcile in order. The fact that the AI is genuinely a peer and not a service. And shipping a working framework instead of a one-off, so anyone can add an AI agent to any Pear app in about ten lines of code.
What I learned
A lot about Pear, Autobase, and the Holepunch stack in general. How to think about state when there is no central source of truth and every peer is equal. How fragile mobile audio actually is once you leave the happy path of a single <audio> tag. And that the hardest part of P2P is not the networking, it is keeping the data model deterministic.
What's next for Jarvis
Real voice activity detection so the AI can interject naturally instead of waiting for a wake word. Multiple bots in the same room with different roles, a notetaker, a translator, a researcher. Local Gemma inference on every peer so the cloud is not even in the loop for the LLM call. And making Pear Bots its own published package so other Pear app developers can drop AI peers into their apps.
Built With
- autobase
- claude
- cloudflare-tunnel
- corestore
- css
- electron
- elevenlabs
- gemma
- holepunch
- html
- hypercore
- hyperdht
- hyperswarm
- javascript
- mediarecorder
- node.js
- pear-runtime
- web-audio-api
- web-speech-api
- websockets
Log in or sign up for Devpost to join the conversation.