Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for ElevenLabs-google-ai

Inspiration

We were inspired by the hackathon's challenge to create a truly conversational AI experience. The goal was to move beyond simple text-based chatbots and build an assistant that users could interact with naturally using their voice. We wanted to combine the intelligence of large language models with the emotional resonance of human-like speech to make technology feel more personal and accessible.

What it does

ElevenLabs-google-ai is a voice-first conversational assistant. Users can visit the web app and start a conversation by either typing or—more impressively—speaking directly into their microphone. The assistant listens, understands the query using Google's powerful Gemini AI, and responds with both intelligent text and a natural, expressive voice generated by ElevenLabs. It features multiple voice personalities, remembers the context of your conversation, and provides a sleek, modern interface.

How we built it

The project was built as a full-stack web application:

  • Backend & AI Logic: A Python server using the Flask framework. It integrates the Google Generative AI SDK (for the Gemini 2.5 Flash model) and the ElevenLabs Python SDK for text-to-speech conversion.
  • Frontend & Interaction: A responsive interface built with HTML, CSS, and vanilla JavaScript. We implemented the Web Speech API to capture voice input directly in the user's browser.
  • Infrastructure & Deployment: The entire application was developed, tested, and is hosted live on Replit. Code is managed with Git and shared publicly on GitHub.
  • Key Feature - Session Memory: We implemented a server-side session system that allows the AI to maintain context throughout a conversation, making interactions feel continuous and coherent.

Challenges we ran into

  • Mobile-First Development: A significant portion of the project was coded and debugged using an Android phone with Termux and Replit, which required adapting to a mobile workflow and solving environment-specific issues.
  • API Integration Hurdles: Getting the two core APIs (Google Gemini and ElevenLabs) to work seamlessly together involved troubleshooting authentication, response formatting, and managing the audio data flow between the server and the browser.
  • Real-Time Audio Handling: Designing a system to generate, save, and serve unique audio files for every AI response without causing delays or memory leaks on the server was a technical challenge.
  • Environment Configuration: Setting up a stable and reproducible environment on Replit, especially after replit.nix configuration conflicts, required careful debugging to restore functionality.

Accomplishments that we're proud of

  1. Delivering a Complete, Voice-First Experience: We successfully built a working application that fulfills the core hackathon promise: an AI you can talk to.
  2. Overcoming Development Constraints: We're proud of building a complex, cloud-integrated project using primarily a mobile phone, demonstrating great adaptability.
  3. Clean & Functional Design: Creating an intuitive user interface that makes advanced voice AI technology easy and engaging for anyone to use.
  4. Effective Integration: Seamlessly weaving together two powerful but independent cloud services (Google AI and ElevenLabs) into a single, cohesive product.

What we learned

  • Practical API Integration: Gained deep hands-on experience with the Google Cloud Gemini API and the ElevenLabs API, including SDK usage, authentication, and best practices.
  • Full-Stack Development on Replit: Learned to structure and deploy a complete Flask application with static files and real-time features on the Replit platform.
  • The Power of Voice UI: Explored the Web Speech API and learned how voice input/output can fundamentally change user interaction with software.
  • Problem-Solving in Constrained Environments: Enhanced our ability to debug and develop software with limited local resources, relying heavily on cloud IDEs and tools.

What's next for ElevenLabs-google-ai

The project has a clear roadmap for enhancement:

  • Advanced ElevenLabs Features: Integrate ElevenLabs' Speech-to-Speech or real-time Conversational AI features to make dialogues even more fluid and dynamic.
  • Expanded Voice Control: Allow users to fine-tune voice parameters like stability, similarity, and emotion for truly customized interactions.
  • Multimodal Input: Incorporate image analysis using Gemini's vision capabilities, allowing users to show the AI pictures and discuss them.
  • Application Scenarios: Develop specific versions tailored for use cases like language learning practice, interactive storytelling, or a voice companion for accessibility.

Built With

Share this project:

Updates