HIVEMIND - Swarm Intelligence for Your Browser

Inspiration

Modern browser work is repetitive and fragmented. We constantly switch tabs to compare options, copy information, summarize pages, and complete multi-step workflows. Existing assistants are often text-only and stop at suggestions. We wanted an agentic system that can actually execute browser tasks in parallel while keeping the user in control.

What it does

HiveMind is a swarm-orchestration layer for browser workflows.

A central Queen model interprets intent and decomposes tasks.
Multiple worker agents execute subtasks concurrently.
A live Neural Link feed shows agent activity and Queen commentary.
Voice interaction supports real-time, interruptible control.
Memory stores context and preference signals for better future runs.
HITL (human-in-the-loop) approvals gate sensitive actions.

Example:

User: “Find the cheapest PHX → SFO flights tomorrow.”
Queen decomposes across sources.
Agents run in parallel.
Queen returns a consolidated result.

How we built it

Backend: FastAPI + asyncio + WebSockets for orchestration and event streaming.
Frontend: React + Zustand for real-time state and operator UX.
LLM layer: Gemini-based orchestration and voice paths with fallback handling.
Voice: Real-time voice session + fallback transcription/response mode.
Memory: Supermemory integration for persistent context.
Deployment assets: Docker, Cloud Build, Cloud Run config.

Challenges we ran into

Keeping real-time voice stable when live sessions drop.
Handling audio format consistency across capture, transport, and inference.
Ensuring final voice transcripts reliably dispatch tasks.
Maintaining observability while many agents run concurrently.
Balancing autonomy with safety for irreversible actions.

Accomplishments that we're proud of

Built a working multi-agent orchestration runtime with live telemetry.
Integrated Queen reasoning into the Neural Link flow.
Enabled continuous voice interaction while background work runs.
Added reproducible setup + cloud deployment assets for judges.
Improved UX with transcript overlays and clear agent/result visibility.

What we learned

Orchestration quality matters more than model prompts alone.
Real-time multimodal systems require graceful fallback by default.
Transparent logs/reasoning dramatically improve trust and debugging.
Small protocol mismatches can break the entire voice loop.
HITL is essential for practical, user-safe browser automation.

What's next for HiveMind

Cloud-native browser execution infrastructure for fully hosted runs.
Stronger policy and permission controls per action category.
Reusable workflow templates and long-horizon task planning.
Team/shared memory contexts.
Richer multimodal outputs and better voice persona controls.

Built With

asyncio
browser-use
chrome-devtools-protocol-(cdp)
docker
fastapi
framer-motion
gemini-live-api
google-cloud-build
google-cloud-run
google-gemini-models
google-genai-sdk
javascript
node.js
python
react
supermemory
typescript
uvicorn
vite
websockets
zustand

Updates

Kevin Doshi started this project — Mar 16, 2026 07:55 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.