Xtension – Hands-Free Browser Extension with Cloned Voice !
Inspiration
I was inspired by the idea of making the web completely hands-free. I wanted to create a browser extension that not only automates websites with voice commands but also responds intelligently in a cloned voice that feels personal. Most assistants sound generic, but by combining Gemini with ElevenLabs, I could build one that’s both smart and uniquely mine.
What it does
Xtension is a Chrome extension that:
- Automates browsing with voice commands (scroll, click, navigate).
- Listens for the keyword “Gemini” to activate AI responses.
- Clones and uses a personal voice for replies via ElevenLabs TTS.
- Lets me clone a new voice or paste an existing
voice_iddirectly from a simple webpage.
How I built it
- Extension (HTML/JS): Captures voice, handles browser automation, and manages the cloned
voice_id. - Flask server (Python): Provides backend support for uploading and cloning voices.
- Assistant script (Python):
- Records microphone input.
- Transcribes with ElevenLabs STT (
scribe_v1). - Responds through Gemini when triggered.
- Speaks back using the active cloned voice.
- Records microphone input.
Challenges I ran into
- Linking the Chrome extension with the Python assistant to share the
voice_id. - Converting browser audio recordings (
webm) into formats usable by ElevenLabs APIs. - Debugging API rate limits and errors in real-time during voice cloning.
- Making the assistant respond only when “Gemini” is invoked while still supporting web automation commands.
Accomplishments that I’m proud of
- Built a working hands-free extension that combines voice automation, AI answers, and cloned speech.
- Successfully integrated browser-level JavaScript with backend Python services.
- Made the assistant feel more human by giving it a customizable cloned voice.
What I learned
I learned how to:
- Build and debug a full Chrome extension with persistent settings.
- Work with real-time audio capture and APIs like ElevenLabs and Gemini.
- Bridge between a web interface and a Python process smoothly.
- Think about accessibility — hands-free browsing and personalized voices make the web more usable.
What’s next for Xtension
- Add support for continuous streaming transcription instead of fixed chunks.
- Allow multiple voice profiles for different moods or contexts.
- Expand automation features for more complex multi-step browsing tasks.
- Package the project for release on the Chrome Web Store so anyone can use it.
Built With
- api
- elevenlab
- extension
- gemini
- html
- javascript
- json
- python

Log in or sign up for Devpost to join the conversation.