EchoMind: Voice‑Driven AI Assistant

Inspiration

The inspiration for EchoMind came from the desire to make AI conversations feel more natural and human‑like. While text chatbots are powerful, they often lack the warmth of spoken interaction. By combining Gemini’s reasoning capabilities with ElevenLabs’ lifelike voice synthesis, I wanted to create an assistant that not only responds intelligently but also speaks back, bridging the gap between human conversation and AI.

What it does

EchoMind is a pastel‑themed, voice‑enabled AI assistant that:

Accepts user input via text or speech recognition 🎤
Generates intelligent replies using Google’s Gemini model
Converts those replies into natural speech using ElevenLabs TTS
Displays messages in a clean, scroll‑free React UI with gradient backgrounds
Provides a friendly, conversational experience that feels both professional and approachable

How we built it

Frontend: Built with React + Tailwind CSS for responsive design and pastel gradients. State management ensures smooth transitions (e.g., input bar moving after the first message).
Backend: Node.js + Express server that integrates:
- Google Cloud Vertex AI (Gemini model) for text generation
- ElevenLabs API for text‑to‑speech conversion
Integration: Axios handles communication between frontend and backend. Audio blobs are streamed back and played in the browser using the Audio API.

Challenges we ran into

Audio playback issues: Browsers block autoplay unless triggered by user gestures. Fixing this required awaiting audio.play() and ensuring proper MIME types (audio/mpeg).
Credential management: Handling Google Cloud service account JSON securely and ensuring environment variables were correctly set.
UI polish: Designing a responsive, scroll‑free layout with pastel gradients while keeping it professional.
Debugging async flows: Ensuring that text replies and audio playback were synchronized without race conditions.

Accomplishments that we're proud of

Successfully integrated two advanced APIs (Gemini + ElevenLabs) into a seamless workflow.
Built a polished, pastel‑themed UI that feels welcoming and professional.
Overcame tricky browser audio restrictions to deliver consistent voice playback.
Learned to debug and refactor backend audio responses for cross‑browser compatibility.

What we learned

The importance of MIME types and headers in audio streaming.
How to manage environment variables securely in full‑stack projects.
Practical experience with speech recognition and text‑to‑speech APIs.
Reinforced knowledge of async programming in JavaScript and React state management.
That even small UI details (like input bar transitions) greatly affect user experience.

What's next for EchoMind

Queued audio playback: Ensuring multiple replies play sequentially without overlap.
Multilingual support: Expanding beyond English to support Telugu and other languages.
Personalization: Allowing users to choose different voices, tones, or speaking speeds.
Deployment: Hosting EchoMind on Vercel/Render with secure backend integration for wider accessibility.

Built With

axios
elevenlabs-api
express.js
google-auth-library
google-cloud-vertex-ai
javascript
jsx
node.js
react
tailwindcss
web-speech-api

Updates

Haritha Vemuri started this project — Dec 10, 2025 08:59 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.