Trying to open the world of videos and other visual media to the visually impaired.
Today’s competitive world is driven by resources available online which is mostly in video format.
Most of these contents are published in a single language based on the content creator’s preference, hence we also aim to break the language barrier.
What it does
Translates a video from any language to to any language.
Service available on all types of devices
Automatically describe video contents when scene changes.
User can request for automatic scene description whenever required.
How we built it
Challenges we ran into
Voice activity detection for synchronicity + optimization
No API which provides direct speech to speech translation
Optimize time complexity!
No API for sentence formation
Appropriate analysis of scenes in the media
Accomplishments that we're proud of
24 hour project.
Built an end to end pipeline of translating videos from one language to another
Automated the process of scene explanation
What we learned
Using and deploying over Google Cloud and consuming its APIs
Building end to end pipelines
What's next for Eyes and Ears
Improve sentence formation
Get better details from the scene
Remove language barrier in video calls/conferences (real time translation)
Enable visually impaired individuals to be part of video calls/conferences (real time video summarization)