Inspiration

Did you know that 4.5% of the World's population is partially or fully blind? That is approximately 400 million people, who struggle to perform tasks that comes naturally for the rest. This inspired me to go on this journey to develop devices that can see for them, and this will hopefully be the first of my efforts in this direction.

What it does

Since this is a basic app, it takes image inputs and allows users to converse with a chatbot to gain insights about the images. This, in turn, can also be used in other fields such as forensic analysis to highlight the little details that maybe missed by humans.

How we built it

This has a network of priliminary pretrained models such as OCR, Image reader, object detection to first extract all text based information about the image. This is then inputed to the gpt-oss model, which uses this context to converse with the user.

Challenges we ran into

Since I am building this on my laptop which is a mac M1 chip with 8GB ram, many of the better models were not usable. In addition, structuring the image-to-text pipeline was a struggle as I had to ensure a wide factor of attributes were included.

What's next for Image chatbot for the blind

This can now be upgraded to recognise and analyse videos, which can then be finetuned to translate sign language and eventaully build a device that can see for the blind or hear for the deaf (Quite similar to the meta/google glasses but enhanced and designed to cater to the differently abled)

Built With

Share this project:

Updates