Inspiration Our inspiration came from the challenges faced in multilingual educational settings, where language barriers hinder understanding. We wanted to create a tool that allows students to follow lectures seamlessly, regardless of the language spoken.
What it does Translect.ai is a real-time speech-to-speech translator that synchronizes translated audio with the speaker's lip movements. It enables users to comprehend lectures in their preferred language while maintaining a natural and immersive experience.
How we built it We utilized a tech stack that includes the ElevenLabs API for speech translation, Azure Cosmos DB for data storage, and Azure OpenAI's Whisper for live transcription. The frontend is built with React.js, and we used Wav2Lip to achieve accurate lip-syncing.
Challenges we ran into One of the primary challenges was ensuring accurate lip synchronization while maintaining the quality of translated speech. Another issue we faced was to find the right set of services that would integrate well together to meet our project's needs efficiently.
Accomplishments that we're proud of We successfully integrated multiple technologies to create a functional prototype that demonstrates effective language translation and lip-syncing. The project showcases the potential for improving accessibility in education.
What we learned This project taught us the intricacies of combining various APIs and technologies to solve complex problems. We gained valuable insights into optimizing system performance and managing real-time data processing.
What's next for Translect.ai We plan to enhance Translect.ai by integrating Azure OpenAI's ChatGPT-4 for real-time speech analysis, utilizing the Azure Video Translate API for direct translation, and developing personalized voice models with Azure Custom Voice.
Built With
- api
- azureblobstorage
- azurecosmosdb
- css
- elevenlabs
- express.js
- html
- react
- wav2lip
Log in or sign up for Devpost to join the conversation.