VibeCheck Inspiration We've all been to that event where the energy just... dies. The DJ keeps playing the wrong vibe, the host looks uncomfortable, and half the crowd is on their phones by 9 PM. The frustrating part is that everyone feels it, but nobody has the bandwidth to fix it in real time. Adjusting the music, cuing a social prompt, changing the visuals, getting the crowd back, that's four jobs happening simultaneously. No one person can do all of that without dropping something. That's the gap VibeCheck fills. We wanted to build something that felt less like an app and more like a live backstage crew, one that never miscommunicates and never misses a cue.

How We Built It We started by agreeing on one rule: define the message schema before touching any agent code. That contract is what let us build five agents in parallel without stepping on each other. From there, we kept it small. A two-agent ping-pong on fetch.ai, one sends a vibe reading, the other replies with a track suggestion, just to prove the communication primitive worked. Once it did, we scaled up. Gemini 2.5 Flash handles the reasoning layer: parsing voice commands into structured intent, then evaluating agent proposals against the current crowd arc. The WebSocket pipeline keeps everything in sync, when MoodAgent finalizes a decision, music, visuals, social prompt, and TTS audio all update at once. The target end-to-end latency is under 22 2 seconds:

Ttotal=Ttranscribe+TGemini+Tnegotiate+Tdispatch≤2sT_{\text{total}} = T_{\text{transcribe}} + T_{\text{Gemini}} + T_{\text{negotiate}} + T_{\text{dispatch}} \leq 2\text{s}Ttotal​=Ttranscribe​+TGemini​+Tnegotiate​+Tdispatch​≤2s The last piece — giving each agent a distinct ElevenLabs voice — was the one that changed how the whole thing felt. Suddenly the room could hear DJAgent push back and MoodAgent overrule it. That's when it stopped feeling like a prototype.

Challenges Latency compounds. Every agent hop adds time, and an open-ended negotiation loop compounds fast: Tnegotiate=∑k=1nδkT_{\text{negotiate}} = \sum_{k=1}^{n} \delta_kTnegotiate​=k=1∑n​δk​ We capped negotiations at one feedback round. DJAgent objects once; MoodAgent decides. No appeals. That kept TnegotiateT_{\text{negotiate}} Tnegotiate​ bounded at roughly 0.4s0.4\text{s} 0.4s in practice.

fetch.ai has opinions. We expected uAgents to behave like lightweight HTTP services. They don't — there's a formal lifecycle, bureau registration, handler setup. Getting five agents starting correctly took hours we hadn't planned for. Annoying in the moment, but it forced us to make each agent's dependencies explicit. Hackathon venues are loud. Live mic input in a conference hall is brutal. We ended up building a simulated audio fallback — a pre-generated energy signal that follows a realistic event arc — so the demo wouldn't break from background noise. The agent architecture is identical; only the input source changes.

What We Learned The biggest surprise wasn't technical — it was behavioral. Early on, all five agents were neutral function-callers. The system worked fine, but it felt hollow. Adding DJAgent's contrarian personality (just a flag and a few lines of prompt) changed everything. Suddenly the negotiation had texture. Agents had opinions. The room felt alive. We went in thinking the hard problem was latency. It was. But the most important thing we learned is that personality design matters as much as architecture in multi-agent systems. A well-orchestrated system with character is a completely different experience from a correct one without it.

Built With

  • elevenlabs
  • elevenlabs-api
  • fastapi
  • fetch.ai-agent-bureau
  • fetch.ai-uagents
  • gemini-2.5-flash
  • google-gemini-api
  • javascript
  • mongodb-atlas
  • next.js
  • python
  • speech
  • typescript
  • vultr-cloud-compute
  • web
  • websocket
Share this project:

Updates