LookOut

Inspiration

During a Diversity, Equity and Inclusion (DEI) conference, my mother had the opportunity to listen to a visually impaired individual speak about his challenges. She shared with me that while technology helps, access to premium devices costing around 1,000-2,000 USD is not universally feasible, and more affordable options often prove ineffective or outdated. Inspired by recent advancements in MultiModal AI and conversations with my teammates at the National University of Singapore, we decided to explore if it's possible to develop an affordable solutions using state-of-the-art technology.

What It Does

LookOut makes use of a simple yet effective mechanism. Users press a button, ask their question aloud, and the device captures a snapshot of the surrounding environment. The images are then analyzed via MultiModal AI, and a spoken response is generated, providing detailed feedback drawn from the visual data. LookOut effectively becomes an additional pair of eyes for the user.

How We Built It

We initially researched and tested various state-of-the-art models for image interpretation, such as Llava, Google's Gemini, Alibaba's Qwen-VL and OpenAI's GPT 4 Vision. Eventually, we settled on OpenAI's offering for processing our visual data due to its speed and accuracy. We used Whisper for the text-to-speech and speech-to-text functions. To ensure that everything ran smoothly, we used parallelisation and multithreading to increase the efficiency of the system.

We managed to reduce the response time by one-third compared to existing solutions.

Various Python libraries were used to assemble integration modules like camera, microphone, and speaker functionality on a Raspberry Pi.

Challenges We Ran Into:

Our most significant challenges were hardware-related. What first appeared as a simple plug-and-play solution eventually required some resourceful problem-solving due to hardware shortages. We used a borrowed Rode microphone and an old pair of headphones to circumvent our audio issues and leveraged a legacy library for our NoIR camera sensor. We even briefly considered using hardware controllers like ESP32 and NodeMCU, but were deterred by a lack of requisite driver support.

Accomplishments We Are Proud Of:

At the onset, we felt the project was ambitious, but driven by the spirit to "hack and roll," we doggedly pursued our vision. We are incredibly proud of coming up with something that aligns closely with our initial vision. Despite unexpected obstacles such as Raspbian crashes and loose connections, our dedication never faltered. We've learned a great deal about the process and tools, equipping us with invaluable knowledge for future projects.

What's Next for LookOut

We eye a future where this implementation becomes even more refined and user-friendly. Despite the challenges we faced, we remain enthusiastic about continuing with a solution that suits our project's computational needs. We aim for LookOut to be an indispensable aid for the visually impaired.

Built With

arduino
python
whisper

Updates

Shrivardhan Goenka started this project — Jan 20, 2024 09:14 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.