Inspiration

According to the Fifth Rwanda Population and Housing Census (2022), there are more than 158,000 individuals living with visual disabilities (blind and low-vision).

Their main problem is accessing everyday visual information such as the following:

  • Reading signs, books
  • Navigating unfamiliar environments

What it does

VisionTTS is a mobile app and smart glass designed to help people with visual disabilities. accessing everyday visual information.

Scene Describing, describe indoor and outdoor environments through audio feedback in Kinyarwanda, to help individuals with visual disabilities understanding their area and navigate

Text Reading, read text like printed documents and books through audio feedback in Kinyarwanda

How we built it

Capture the Scene A mobile camera take a picture of what is in front of the user — such as a sign, a book, or the surrounding environment.

Understand What’s Seen the backend logic runs an AI model (qwen3-vl:2b-instruct-q4_K_M) that analyzes the image and understands what is happening in the scene

Translate into Kinyarwanda Since the AI describe images in English, the system automatically translates the description into Kinyarwanda using translation model (Quantized_Nllb_Finetuned_Health_En_Kin_8bit_v2) and claude Ai

Convert to Speech The translated description is turned into voice audio using a Kinyarwanda speech model (KinyarwandaTTS_female_voice). The user hears the description through phone speakers

Challenges we ran into

VLM describe images in English; there is no Kinyarwanda support

Accomplishments that we're proud of

app Performance

Metric Average Response Time Quality Comments
Scene Describing 31.5 seconds ⭐ ⭐ ⭐ ⭐
Text Reading 43 seconds ⭐ ⭐ ⭐ works only on printed documents

What we learned

the real gab of Kinayrwanda speech and transltion technlogy

What's next for VisionTTS

software development and design phase is finished next is putting up the glass hardware and connect with the software

Built With

Share this project:

Updates