Inspiration

Trying to open the world of videos and other visual media to the visually impaired.
Today’s competitive world is driven by resources available online which is mostly in video format.
Most of these contents are published in a single language based on the content creator’s preference, hence we also aim to break the language barrier.

What it does

Translates a video from any language to to any language.
Service available on all types of devices
Automatically describe video contents when scene changes.
User can request for automatic scene description whenever required.

How we built it

TBD

Challenges we ran into

Voice activity detection for synchronicity + optimization
No API which provides direct speech to speech translation
Optimize time complexity!
No API for sentence formation
Appropriate analysis of scenes in the media

Accomplishments that we're proud of

24 hour project.
Built an end to end pipeline of translating videos from one language to another
Automated the process of scene explanation

What we learned

Using and deploying over Google Cloud and consuming its APIs
Building end to end pipelines

What's next for Eyes and Ears

Improve sentence formation
Improve synchronization
Get better details from the scene
Remove language barrier in video calls/conferences (real time translation)
Enable visually impaired individuals to be part of video calls/conferences (real time video summarization)

Share this project:

Updates