Inspiration
Modern browser work is repetitive and fragmented. We constantly switch tabs to compare options, copy information, summarize pages, and complete multi-step workflows. Existing assistants are often text-only and stop at suggestions. We wanted an agentic system that can actually execute browser tasks in parallel while keeping the user in control.
What it does
HiveMind is a swarm-orchestration layer for browser workflows.
- A central Queen model interprets intent and decomposes tasks.
- Multiple worker agents execute subtasks concurrently.
- A live Neural Link feed shows agent activity and Queen commentary.
- Voice interaction supports real-time, interruptible control.
- Memory stores context and preference signals for better future runs.
- HITL (human-in-the-loop) approvals gate sensitive actions.
Example:
- User: “Find the cheapest PHX → SFO flights tomorrow.”
- Queen decomposes across sources.
- Agents run in parallel.
- Queen returns a consolidated result.
How we built it
- Backend: FastAPI + asyncio + WebSockets for orchestration and event streaming.
- Frontend: React + Zustand for real-time state and operator UX.
- LLM layer: Gemini-based orchestration and voice paths with fallback handling.
- Voice: Real-time voice session + fallback transcription/response mode.
- Memory: Supermemory integration for persistent context.
- Deployment assets: Docker, Cloud Build, Cloud Run config.
Challenges we ran into
- Keeping real-time voice stable when live sessions drop.
- Handling audio format consistency across capture, transport, and inference.
- Ensuring final voice transcripts reliably dispatch tasks.
- Maintaining observability while many agents run concurrently.
- Balancing autonomy with safety for irreversible actions.
Accomplishments that we're proud of
- Built a working multi-agent orchestration runtime with live telemetry.
- Integrated Queen reasoning into the Neural Link flow.
- Enabled continuous voice interaction while background work runs.
- Added reproducible setup + cloud deployment assets for judges.
- Improved UX with transcript overlays and clear agent/result visibility.
What we learned
- Orchestration quality matters more than model prompts alone.
- Real-time multimodal systems require graceful fallback by default.
- Transparent logs/reasoning dramatically improve trust and debugging.
- Small protocol mismatches can break the entire voice loop.
- HITL is essential for practical, user-safe browser automation.
What's next for HiveMind
- Cloud-native browser execution infrastructure for fully hosted runs.
- Stronger policy and permission controls per action category.
- Reusable workflow templates and long-horizon task planning.
- Team/shared memory contexts.
- Richer multimodal outputs and better voice persona controls.
Built With
- asyncio
- browser-use
- chrome-devtools-protocol-(cdp)
- docker
- fastapi
- framer-motion
- gemini-live-api
- google-cloud-build
- google-cloud-run
- google-gemini-models
- google-genai-sdk
- javascript
- node.js
- python
- react
- supermemory
- typescript
- uvicorn
- vite
- websockets
- zustand
Log in or sign up for Devpost to join the conversation.