Speechify

Inspiration

The inspiration behind this project comes from a close friend of mine who cannot hear or speak. I've seen firsthand how challenging it can be for them to communicate with others, especially in situations where sign language isn't understood. Their struggles inspired me to create a tool that could help bridge this communication gap and make their life easier. I wanted to use technology to empower them and others like them, ensuring they can interact with the world more confidently and inclusively.

What it does

This app is designed to empower individuals who cannot hear or speak by providing tools that enable seamless communication and accessibility. It offers three main features:

Chat: Facilitates real-time conversations between the user and others. The user can type a message, which the app converts to speech for the other person. In return, spoken responses are transcribed into text, allowing the user to understand and participate in conversations effortlessly. And the best part is you dont have to use English. You can use any language which will be auto translated into English by Gemini API.

Video Assistance: Many videos online lack subtitles, making them inaccessible to individuals with hearing impairments. This feature allows users to input a video link, and the app provides a descriptive summary of the content. This ensures they can engage with videos effectively, breaking barriers in consuming media content.

Real-Time Transcription: This tool transcribes conversations or audio in real time, ideal for watching movies, attending meetings, or engaging in any scenario where understanding spoken words is crucial. Additionally, the user can summarize lengthy transcripts into concise key points with just one click, saving time and effort.

With these features, the app strives to create an inclusive environment, enhancing the everyday experiences of those with hearing and speech disabilities.

How we built it

We used Gemini Api to summarize, translate any text and go through a link to describe a video. Android's Text-To-Speech Api was used to turn any speech into text.

Challenges we ran into

The gemini model is a powerful LLM model but setting the prompt for it was really challenging for us. As we were getting irrelevant replies from it. So we had to experiment a bit.