Inspiration

Most AI tools today rely on typing prompts and reading long responses. While powerful, this interaction model can feel limiting when working with complex tasks that require multiple steps, tools, and reasoning.

With the release of the Google Gemini Live API, real-time voice interaction with AI became possible. That inspired us to explore a broader idea: an AI system that not only responds conversationally but can also create and run autonomous agent workflows in real time.

Halia was built to explore that idea — a platform where users can describe a goal naturally and the system generates a team of agents to accomplish it. Voice interaction is supported through Gemini Live, but the system works seamlessly with both text and voice interfaces.

What it does

Halia is a multi-agent AI platform that generates and executes workflows from natural language instructions.

A user can describe a task, and the system automatically builds a graph of agents to complete it.

For example, a user might ask:

“Create agents that research the latest Gemini API integrations and summarize the findings.”

Halia interprets the request and creates agents responsible for tasks like searching the web, analyzing documentation, extracting examples, and summarizing results. The execution process is streamed live so users can see what each agent is doing.

How we built it

Halia combines real-time AI interaction with a flexible agent runtime.

The backend is built with Python 3.11 and includes a custom asynchronous graph executor that runs agent workflows. A Queen agent orchestrates worker agents, while a Judge agent evaluates results and handles failures.

The frontend uses React, TypeScript, and Tailwind CSS to provide a chat interface and live observability.

Voice interaction is powered by the Gemini Live API, enabling real-time audio streaming and transcripts that integrate directly into the agent workflow.

Challenges we ran into

One of the biggest challenges was coordinating real-time interaction with asynchronous agent execution.

Voice responses, text messages, and agent workflows all operate at different speeds. Designing a system where users could interact naturally while agents were executing tasks in parallel required careful coordination between the runtime and the frontend event stream.

Accomplishments that we're proud of

We successfully built a working multi-agent platform that can:

Generate agent workflows automatically

Execute them in a graph-based runtime

Stream execution events in real time

Integrate voice interaction using Gemini Live

Being able to watch agents collaborate on a task while interacting with the system was one of the most exciting results.

What we learned

Building Halia showed how powerful agent-based systems can be when combined with natural interaction methods.

We also learned how important observability and transparency are when working with autonomous agents.

What's next for Halia – Voice-First AI Agent Platform

Future work includes visual workflow editing, persistent agent memory, additional tool integrations, and improved debugging capabilities.

Our long-term goal is to build a platform where users can simply describe complex goals and intelligent agent systems design and execute the workflow automatically.

Built With

Share this project:

Updates