Inspiration

The team was inspired to create Signify AI after recognizing the significant communication barriers that people with speech or hearing impairments face in their communities. The project was specifically designed to support non-standard Akan speech, particularly for users with cerebral palsy and speech delays. A key motivator was user feedback, with one person stating, "I don't want it to just write what I said, but to speak it for me". Another user said they wanted deaf friends to be able to understand them, which led to the expansion of the speech-to-sign feature. This direct input from users shaped the project's direction and features.

What it does

Signify AI is an AI-based communication tool designed to help deaf, hard-of-hearing, visually impaired, and autistic individuals. It bridges communication gaps by providing real-time transcription and translation services. The system can transcribe non-standard Akan speech, play it back audibly using a Text-to-Speech (TTS) API that supports Akan Twi, and translate the text into sign language animations.

How we built it

The project was built as a live, user-validated pipeline on a cloud server using Render and FastAPI to ensure speed and scalability. The team began with the Whisper Tiny model as the foundation for their ASR (Automatic Speech Recognition) pipeline and then customized it.

The system's high-level architecture works as follows: the app records audio, sends it to the ASR API endpoint, and the server processes it. On the server, several key enhancements were implemented based on user feedback, including a repetition filter for stutters, noise normalization, and short-phrase chunking to prevent semantic drift. The team also integrated a Text-to-Speech (TTS) API that supports Akan Twi to enable voice playback. After transcription, the text is used in one of three ways: played back as voice output, passed to the sign language renderer, or translated into a corresponding sign language animation. The speech-to-sign feature was trained using a machine learning model, specifically LSTM, to detect and translate sign language. To animate the character, an open-source model was used, which utilizes MediaPipe points from the transcribed word to model the movement of the avatar.

Challenges we ran into

The team faced challenges with the initial ASR model's accuracy, particularly with longer or slurred phrases. The model had a Word Error Rate (WER) of 0.723 and a Character Error Rate (CER) of 0.493. This high error rate necessitated the development of new strategies, such as filtering and chunking, to improve performance.

Accomplishments that we're proud of

The team is proud of several accomplishments, including:

Developing a live, user-validated ASR pipeline. Achieving real-time transcription for non-standard Akan speech. Successfully translating speech into sign language animations and also sign language to text Creating a voice playback feature for transcribed non-standard Akan speech. Ensuring user-tested reliability, with one user expressing their appreciation for the tool's ability to help them communicate with their deaf friends. What we learned The team's main takeaway was the critical importance of user-centered development. They learned that direct feedback from real users was essential in shaping the project and identifying the most needed features, such as the repetition filter and the buffering spinner. This approach ensured that the project was not based on hypothetical ideas but on genuine user needs.

What's next for Signify AI

Hardware Integration (B2B/B2G): Selling or licensing integration-ready versions of the technology for devices such as assistive smart glasses, public service kiosks, and wearables.

Grants & CSR Partnerships: Securing grants and forming partnerships with corporate social responsibility initiatives to expand the project's reach.

Built With

  • api
  • fastapi
  • flutter
  • lstm
  • ocr
  • public-service-kiosks
  • render
  • whisper
Share this project:

Updates