Inspiration

Modern browser work is repetitive and fragmented. We constantly switch tabs to compare options, copy information, summarize pages, and complete multi-step workflows. Existing assistants are often text-only and stop at suggestions. We wanted an agentic system that can actually execute browser tasks in parallel while keeping the user in control.

What it does

HiveMind is a swarm-orchestration layer for browser workflows.

  • A central Queen model interprets intent and decomposes tasks.
  • Multiple worker agents execute subtasks concurrently.
  • A live Neural Link feed shows agent activity and Queen commentary.
  • Voice interaction supports real-time, interruptible control.
  • Memory stores context and preference signals for better future runs.
  • HITL (human-in-the-loop) approvals gate sensitive actions.

Example:

  • User: “Find the cheapest PHX → SFO flights tomorrow.”
  • Queen decomposes across sources.
  • Agents run in parallel.
  • Queen returns a consolidated result.

How we built it

  • Backend: FastAPI + asyncio + WebSockets for orchestration and event streaming.
  • Frontend: React + Zustand for real-time state and operator UX.
  • LLM layer: Gemini-based orchestration and voice paths with fallback handling.
  • Voice: Real-time voice session + fallback transcription/response mode.
  • Memory: Supermemory integration for persistent context.
  • Deployment assets: Docker, Cloud Build, Cloud Run config.

Challenges we ran into

  • Keeping real-time voice stable when live sessions drop.
  • Handling audio format consistency across capture, transport, and inference.
  • Ensuring final voice transcripts reliably dispatch tasks.
  • Maintaining observability while many agents run concurrently.
  • Balancing autonomy with safety for irreversible actions.

Accomplishments that we're proud of

  • Built a working multi-agent orchestration runtime with live telemetry.
  • Integrated Queen reasoning into the Neural Link flow.
  • Enabled continuous voice interaction while background work runs.
  • Added reproducible setup + cloud deployment assets for judges.
  • Improved UX with transcript overlays and clear agent/result visibility.

What we learned

  • Orchestration quality matters more than model prompts alone.
  • Real-time multimodal systems require graceful fallback by default.
  • Transparent logs/reasoning dramatically improve trust and debugging.
  • Small protocol mismatches can break the entire voice loop.
  • HITL is essential for practical, user-safe browser automation.

What's next for HiveMind

  • Cloud-native browser execution infrastructure for fully hosted runs.
  • Stronger policy and permission controls per action category.
  • Reusable workflow templates and long-horizon task planning.
  • Team/shared memory contexts.
  • Richer multimodal outputs and better voice persona controls.

Built With

Share this project:

Updates