VisuAI

Group logo
Poster
App interface
Architecture

Problem we are trying to solve

Singapore's visually impaired population is escalating, compounded by an aging demographic and pervasive youth addiction to digital technology. This escalating concern is underscored by the mounting challenge of everyday tasks heavily reliant on peripheral vision. As the aging populace and technology-dependent youth intersect, addressing this issue becomes imperative for ensuring equitable accessibility and enhancing the quality of life for all. Coupled with the high cost of existing solution for the visually impaired, "purchasing" independence and mobility is not accessible for all. For an example, the cost of training a guide dog ranges from 45-50k.

Inspiration

We came across an app call MySmartEye built 10 years ago with the goal of aiding the visually impaired through volunteers who will provide descriptions of the image. We see great potential in leveraging the power of AI to bridge the gap for the visually impaired, revolutionizing their interaction with the world.

How the solution is unique

What sets our project apart is its ingenious use of widely accessible AI. As AI technologies become increasingly advanced, we're harnessing their potential to create an inclusive and compassionate society for the visually impaired community. Our solution empowers them with the ability to convert images into clear and coherent speech, effectively translating the visual world into a format they can easily comprehend. By using scalable technology, we are able to provide a solution at a low cost to the visually impaired, thereby lowering the barrier to access emerging technologies. We are thus able to empower others through inclusivity.

How it will be used

Through a user-centric designed application, the interface is simple and easy to navigate for the visually impaired. There are currently 3 features, (1) description of surroundings, (2) OCR of objects, (3) interactive voice activated assistant. The application is a touch and voiced based medium of communication.

Technologies used

Deep AI trained models such as

OCR.Space
Generative AI (ChatGPT)
Image to text (HuggingFace)
Text to speech

Key Challenges

Sourcing for reliable AI solutions

There are many online open-source technologies that can be easily integrated into our application. However, to provide a reliable solution to the visually impaired, we had to test and validate the models over different test sets to ensure the viability of our product.

Business and scalability

As AI models become increasingly complex and trained with greater datasets, its accuracy has been shown to improve. We believe that the technology will continue to mature and can provide not only greater reliability and accuracy but also more exciting use cases.

The increasing concern of youths developing myopia, or the sliver tsunami indicates that we could expect a greater number of users of our application. As such we have designed the application to be lightweighted and easily accessible not just on phones but also tablets.

Built With

chatgpt
huggingface
javascript
ocrspace
react
redis

Submitted to

What The Hack 2023
- Winner Winner - Inclusivity & Accessibility Tech

Created by

I mainly worked on the front end of the live feature of speech recognition and discussed the implementations & features together with the team. Moreover, I also created the landing page of the product. We really hope that this product will help visually impaired people to navigate our modern world more safely and easily.

Kristian Hadinata Achwan
Tze Kean Ng
Nanyang Technological University
Andrew Oak
Bryan Tay
Building for the future