πŸš€ About the Project – SonicGPT: Your Multilingual Voice Assistant

πŸ”₯ Inspiration

The inspiration behind SonicGPT came from a simple question:

What if ChatGPT could talk back to you in your own language β€” instantly, beautifully, and with personality?

Voice interfaces are the future of accessibility, productivity, and human-computer interaction. However, existing tools often lack multilingual support, have clunky UI, or depend on paid APIs (like ElevenLabs). We wanted to solve that by building a beautiful, multilingual, AI-powered voice assistant β€” fully open-source.


πŸ“š What I Learned

  • How to stream speech-to-text and TTS in real time
  • Building React + Vite frontend with animated UI elements like audio waves
  • Using Flask as a lightweight backend to glue everything together
  • Integrating Grok, OpenRouter, and browser-native TTS fallbacks
  • Managing multi-language AI interaction and TTS synthesis
  • Handling edge cases in audio playback, voice switching, and async responses

πŸ› οΈ How I Built It

# Frontend
React + TypeScript + Vite
Web Speech API for STT
CSS animations for audio wave

# Backend
Flask (Python)
Endpoints for /generate and /tts
Integration with Grok, OpenRouter, ElevenLabs

# Other Tools
.env for API keys
Browser TTS fallback system
  • Multilingual logic was built into both backend and frontend
  • Users select a language β†’ query is translated if needed β†’ AI responds in that language β†’ response is spoken in chosen voice
  • If ElevenLabs fails, fallback to browser’s built-in TTS

🧱 Challenges Faced

  • ElevenLabs API issues: Wrong voice ID errors (404 voice_not_found)
  • Browser compatibility: Some TTS voices don’t work on Linux/Firefox
  • Microphone access: User permissions often block STT
  • Dynamic language handling: AI sometimes responded in English even if input was Urdu/French/etc.
  • API rate limits: Free tiers of ElevenLabs and Grok imposed restrictions, requiring intelligent fallback routing

πŸ’‘ Future Improvements

  • Add Whisper or faster local STT model
  • Integrate real-time translation layer for any input language
  • Add voice cloning and emotion-based TTS
  • Package it as a PWA mobile app

Built With

Share this project:

Updates