π About the Project β SonicGPT: Your Multilingual Voice Assistant
π₯ Inspiration
The inspiration behind SonicGPT came from a simple question:
What if ChatGPT could talk back to you in your own language β instantly, beautifully, and with personality?
Voice interfaces are the future of accessibility, productivity, and human-computer interaction. However, existing tools often lack multilingual support, have clunky UI, or depend on paid APIs (like ElevenLabs). We wanted to solve that by building a beautiful, multilingual, AI-powered voice assistant β fully open-source.
π What I Learned
- How to stream speech-to-text and TTS in real time
- Building React + Vite frontend with animated UI elements like audio waves
- Using Flask as a lightweight backend to glue everything together
- Integrating Grok, OpenRouter, and browser-native TTS fallbacks
- Managing multi-language AI interaction and TTS synthesis
- Handling edge cases in audio playback, voice switching, and async responses
π οΈ How I Built It
# Frontend
React + TypeScript + Vite
Web Speech API for STT
CSS animations for audio wave
# Backend
Flask (Python)
Endpoints for /generate and /tts
Integration with Grok, OpenRouter, ElevenLabs
# Other Tools
.env for API keys
Browser TTS fallback system
- Multilingual logic was built into both backend and frontend
- Users select a language β query is translated if needed β AI responds in that language β response is spoken in chosen voice
- If ElevenLabs fails, fallback to browserβs built-in TTS
π§± Challenges Faced
- ElevenLabs API issues: Wrong voice ID errors (
404 voice_not_found) - Browser compatibility: Some TTS voices donβt work on Linux/Firefox
- Microphone access: User permissions often block STT
- Dynamic language handling: AI sometimes responded in English even if input was Urdu/French/etc.
- API rate limits: Free tiers of ElevenLabs and Grok imposed restrictions, requiring intelligent fallback routing
π‘ Future Improvements
- Add Whisper or faster local STT model
- Integrate real-time translation layer for any input language
- Add voice cloning and emotion-based TTS
- Package it as a PWA mobile app

Log in or sign up for Devpost to join the conversation.