yaR | Devpost

Inspiration

We were inspired to select this goal and target due to a personal connection of one of our team members. His mother shared a story from a DEI conference at Deloitte, and we were moved. The story was of Ajay Minhoja, a visually impaired CFA, who shared his experiences of navigating the workplace. This, coupled with the emergence of multimodal AI, led us to the realization that we could bridge the gap between technology and the needs of visually impaired individuals. By doing so, we aim to reduce inequality and enable their participation in social, economic, and political spheres.

What it does

yaR (formerly LookOut) is an AI-powered wearable device designed to enhance the environmental awareness of visually impaired individuals. It utilizes advanced image and audio processing technologies to analyze the surroundings and provides real-time auditory feedback to the user, enabling them to navigate their environment more confidently and independently.

How we built it

Our solution follows a client-server architecture with the following high-level components:

IoT Wearable Device: This is a Raspberry Pi Zero, worn as a pendant. It performs preliminary computations such as noise removal and captures video/images and audio through connected sensors. The device runs Python and uses OpenCV for image processing. This data is then sent to the server for further processing.
Server: The server is responsible for the majority of the computation. It runs on AWS EC2 and utilizes Django for the backend. The server processes the audio data using OpenAI's Whisper for speech-to-text and selects the best frame from the video. For Optical Character Recognition (OCR), we use Google Cloud Vision API to extract text from images before sending it to OpenAI's GPT-4 Vision for analysis.
Response Generation: OpenAI's GPT-4 Vision processes the input data and generates a response to the user's query. This response is converted into an audio file using OpenAI's text-to-speech, which is streamed back to the IoT wearable device.
Audio Playback: The IoT wearable device is equipped with an audio-enabled DAC (Digital-to-Analog Converter) that plays the received audio, providing the user with the requested information in an auditory format.

Additional Features:

Follow-up Functionality: We store all conversations in Firebase and images in Firestore. This allows for context-aware responses and follow-up questions.
Semantic Search: We use Pinecone for semantic search with a vector database to find the most similar past queries, enhancing the relevance of responses.

This updated architecture ensures efficient processing and delivery of information to the user, enhancing their environmental awareness through auditory feedback while leveraging the latest advancements in AI technology.

Challenges we ran into

One significant challenge we encountered was related to the choice of hardware for our IoT wearable device. Initially, we opted for a Raspberry Pi Zero but later decided to switch to a less computationally intensive board, the RealTek AMB82-Mini. However, this decision led us into a two-week spiral of challenges. The RealTek AMB82-Minii, while compact and efficient, lacked community support and essential functionalities, such as playing MP3 files. Despite our efforts to piece together different libraries and workarounds, we were unable to get certain features to function properly on our specific board, the Amoeba Pro 2.

Faced with these obstacles, we made the strategic decision to prioritize quick iteration and user feedback integration. We reverted to using the Raspberry Pi, which allowed us to leverage Python and the extensive community support available. This decision enabled us to focus on the critical aspects of our project without being hindered by hardware limitations.

Accomplishments that we're proud of

The primary cause behind the development of yaR was to enhance the independence of visually impaired individuals. The creation of our prototype marked a significant step towards achieving this goal. We are particularly proud of how we were able to design and produce a 3D-printed case that neatly houses all the necessary electronics. This not only ensures that the device is user-friendly and comfortable to wear but also demonstrates our ability to integrate hardware and software seamlessly. The assembled device is a tangible representation of our commitment to improving the quality of life for those with visual impairments, and we are excited about the potential impact yaR can have on their everyday independence.

What's next for yaR

Our scalability strategy for yaR involves both expanding our customer base and leveraging academic collaborations. In Singapore, we aim to partner with organizations that assist visually impaired individuals, ensuring that yaR reaches those who can benefit from it the most. Additionally, we plan on actively reaching out to professors at NTU and NUS, exploring opportunities for collaboration in research labs such as the AI IoT lab. Our team members plan on engaging with professors specializing in computer organization, multimodal AI research, and the Smart Systems Institute at NUS. By collaborating with these key stakeholders and making use of university resources, we aim to further develop yaR into a real-world product that can be distributed through partnerships with organizations, ultimately enhancing its impact on the visually impaired community.

Built With

Updates

sheldor07 Gulati started this project — Feb 24, 2024 11:27 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.