Inspiration

We wanted to make AI accessible to everyone, especially those who prefer talking over typing. We envisioned a voice-first AI platform where interacting with technology feels as natural as speaking to a person. Tvara empowers anyone to use powerful AI agents simply by speaking.

What it does

Tvara is an AI voice chatbot with a friendly UI. It lets you create personalized AI agents, called -"pods," for different needs like general chats, research, shopping or any other specific domain that you may think of.

You just talk to your chosen pod, and Tvara gives you both a voice response and a real-time text display. It's perfect for anyone who finds typing difficult or inconvenient. Under the hood, it's powered by multi-agent AI frameworks for specialized, accurate responses. The tool's vision is the first step to help with accessibility to people with disabilities, yet are eager to learn more.

How we built it

  • Frontend: React.js
  • Backend: FastAPI (Python)
  • Speech-to-Text: OpenAI Whisper
  • LLM: Gemini 2.5 Flash
  • Text-to-Speech: Tacotron2 (via CoquiTTS)
  • Database: MongoDB Atlas
  • Agentic Framework: LangChain + LangGraph

One standout is our Research Pod, built on a Supervisor Architecture:

A Supervisor delegates your query to a Researcher. The Researcher gathers insights and proposes ideas. A Judge evaluates feasibility and gives feedback. The Supervisor compiles and delivers a refined response to the user.

Challenges we ran into

Our biggest challenges included achieving real-time responsiveness for both voice and text, managing context across multiple user-created AI pods, ensuring accurate voice recognition for diverse accents, and balancing broad AI capabilities with specialized knowledge for each pod.

Accomplishments that we're proud of

We're proud of creating an end-to-end system in a short duration. Though not perfect yet, the realtime TTS approach is something we are happy about. The speech may not be that clear for now, but we aim to achieve better in the coming weeks. Apart from that, the agentic AI framework works great. We are proud to have Langgraph implemented especially for the Research Pod that has a multi-agent setup under the Supervisor Architecture.

We have implemented an OTP verification for new users' email from security pov to prevent abuse.

We also got a hands-on of how the CI/CD pipeline works by setting up our repo on GitLab and adding a .yaml file to ensure that a pipeline runs whenever a contributor pushes code to our dev branch. The backend is deployed on GCP (without GPU so yeah, expect latency) and frontend is on vercel.

What we learned

We gained deep insights into advanced voice recognition and NLP, designing multi-agent architectures, optimizing for real-time AI response generation, and crafting user-centric UI/UX for voice interfaces. We also learned the nuances of model specialization for different use cases.

What's next for Tvara: Your voice, your AI, effortless results.

More pod categories & deeper personalization Third-party API integrations for actionable responses Voice Agent Builder — a drag-and-drop system for devs Mobile + desktop apps for broader accessibility GPU-backed inference for ultra-low-latency conversations

Built With

Share this project:

Updates