Inspiration
Our team was inspired by the power of human connection through voice. We realized that while AI has mastered text, much of the "soul" of communication is lost in flat, robotic translations. We build Voice tutor AI to create an experience that feels personal, emotional, and truly human.
What it does
Voice tutor AI is a high-precision transcription tool that converts audio into text in seconds. Powered by the Elevenlab Scribe API, it provides near-human accuracy, identifies different speakers, and automatically detects over 99 languages.
How we built it
Backend setup (FastAPI) Created FastAPI server with proper project structures Backend runs successfully on Virtual Environment Created and activated Python virtual environment Installed required dependencies (fastapi, uvicorn, gtts, etc.). Server starts cleanly inside the venv.
Challenges we ran into
Our main challenge was handling large audio files and ensuring a smooth user experience during the processing time. Current Issue is that the backend API call to Gemini is failings
What we learned
POST /ask → accepts user text input. GET /audio → serves generated audio file. Swagger UI available at /docs for testing.
What's next for Voice Tutor AI
Personalized Voice Cloning: Allow students to learn from a tutor that speaks in their own voice or a voice they find most comfortable. AI Study Notes: Use an LLM to automatically turn transcripts into summaries, quizzes, and flashcards.

Log in or sign up for Devpost to join the conversation.