SonicGPT

Frontend UI

🚀 About the Project – SonicGPT: Your Multilingual Voice Assistant

🔥 Inspiration

The inspiration behind SonicGPT came from a simple question:

What if ChatGPT could talk back to you in your own language — instantly, beautifully, and with personality?

Voice interfaces are the future of accessibility, productivity, and human-computer interaction. However, existing tools often lack multilingual support, have clunky UI, or depend on paid APIs (like ElevenLabs). We wanted to solve that by building a beautiful, multilingual, AI-powered voice assistant — fully open-source.

📚 What I Learned

How to stream speech-to-text and TTS in real time
Building React + Vite frontend with animated UI elements like audio waves
Using Flask as a lightweight backend to glue everything together
Integrating Grok, OpenRouter, and browser-native TTS fallbacks
Managing multi-language AI interaction and TTS synthesis
Handling edge cases in audio playback, voice switching, and async responses

🛠️ How I Built It

# Frontend
React + TypeScript + Vite
Web Speech API for STT
CSS animations for audio wave

# Backend
Flask (Python)
Endpoints for /generate and /tts
Integration with Grok, OpenRouter, ElevenLabs

# Other Tools
.env for API keys
Browser TTS fallback system

Multilingual logic was built into both backend and frontend
Users select a language → query is translated if needed → AI responds in that language → response is spoken in chosen voice
If ElevenLabs fails, fallback to browser’s built-in TTS

🧱 Challenges Faced

ElevenLabs API issues: Wrong voice ID errors (404 voice_not_found)
Browser compatibility: Some TTS voices don’t work on Linux/Firefox
Microphone access: User permissions often block STT
Dynamic language handling: AI sometimes responded in English even if input was Urdu/French/etc.
API rate limits: Free tiers of ElevenLabs and Grok imposed restrictions, requiring intelligent fallback routing

💡 Future Improvements

Add Whisper or faster local STT model
Integrate real-time translation layer for any input language
Add voice cloning and emotion-based TTS
Package it as a PWA mobile app

Built With

browser-tts
css3
elevenlabs-api
flask
grok-api
html5
javascript-(es6)
node.js
npm
openrouter-api
python
react
typescript
vite
web-speech-api

Updates

Ali Jafar started this project — Jul 22, 2025 01:33 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.