Spectra

Inspiration

Navigating the world is a difficult task, especially for the visually-impaired. Amidst remarkable advancements in AI benefitting tech & other industries, it hasn't yet uplifted the baseline quality of life of some. We decided to focus our efforts on bringing these improvements to the blind and visually impaired, who can benefit a lot from artificial intelligence. Leveraging cutting-edge computer vision technology, we aim to replicate key facets of eyesight in an accessible and comprehensible manner. Enter Spectra.

What it does

Our app is designed to help guide the visually-impaired in difficult-to-navigate situations. We employ a two-part solution: a user-activated visual scene analyzer, and a live hazard detection. Both are narrated by a humanistic AI narrator to notify them of the most important things to focus on. The moment a user utters their custom activation phrase, our app captures the surrounding scene. The program analyzes it for interesting & important information and then instantly relays it back to the user. This ensures they are fully aware of their environment. We also use a live video feed of the user's environment when out in public to detect any possible hazards when walking — things that could be critical to their safety, but that can't be easily identified without sight. Whenever our app detects a potential danger, we notify the user through easy-to-understand narration.

How we built it

Architecture

Our cross-platform mobile app is built with React Native and JavaScript. On the backend, we employ a mix of our own detection logic & vision algorithms as well as off-the-shelf models (such as Google's Vertex AI offerings). We expose this functionality via endpoints on a performant Litestar API, and implement caching mechanisms to ensure instant response times.

Scene Analysis

Upon user invocation, Our CV model extracts any important visual features from the snapshot. We then leverage Google's PaLM language model to provide key insights about the scene, and programmatically compose a conversational yet informative response through LLM orchestration frameworks. This is passed along to our narration service, which instantaneously relays this information back to the user.

Hazard Detection

We perform framewise sampling of the live video feed in real time, marking potential obstacles and dangerous objects. Our program further analyzes these scenes by leveraging Google's Vertex AI ImageGen API to selectively disambiguate between hazards and provide recommendations to the user for immediate actions.

Narration

We employ Google Cloud's Text-to-Speech service to provide a seamless narration experience to the user, acting as their all-in-one auditory guide to the outside world. We also provide customizable voice profiles to ensure the user is as comfortable with their assistant's responses as possible.

Challenges we ran into

We encountered several challenges while developing Spectra. Fine-tuning pretrained computer vision models for object detection was a complex task, especially when we tried to use our own custom image data. Additionally, optimizing and transitioning lightweight machine learning models to the frontend to minimize latency was another significant hurdle that we had to take several steps to overcome. Our team lacked mobile app development experience, so the creation of platform-specific features using React Native was a challenging endeavor. Condensing the full project timeline (ideation, design, prototyping, iteration, testing, polishing, and consideration of accessibility and other edge cases) into a timeline of fewer than two days presented a challenge that resulted in us having to make decisions on what to prioritize.

Accomplishments that we're proud of

We’re proud that we were able to successfully develop our own harm detection model, implement diverse object detection models, and use pseudo-realistic text-to-speech to identify relevant/significant obstacles and hazards for the visually impaired. The models also elucidate scenes in a concise yet detailed manner. Our commitment to inclusivity led us to a design choice of using React Native to develop the mobile app. We wanted to ensure cross-compatibility in all our design choices and be as accessible as possible. We also made sure to adhere to WCAG’s accessibility standards, including accessibility considerations such as cooperation with iOS’s VoiceOver and Android TalkBack.

What we learned

Throughout this project, our team delved into the intricacies of cross-platform development, utilizing the power of OS-native functionalities to ensure a seamless user experience across different operating systems. As we implemented hazard detection and object recognition models, our understanding of computer vision applications deepened. What we learned enabled us to identify, interpret, and present various environmental elements crucial for enhancing safety and navigation for the visually impaired. Moreover, our exploration into multi-processing techniques significantly increased our app’s throughput, optimizing its performance and responsiveness. Engaging with Google’s Vertex AI broadened our horizons, offering valuable insights into leveraging advanced AI capabilities for enhanced object detection and model training, further enriching our development journey.

What's next for Spectra

As Spectra prepares for its debut in the app store, several pivotal steps are on our agenda. Foremost is to further tune our model to amplify hazard and obstacle detection. Our aspiration is for Spectra to fill the role of eyesight in all sorts of situations, so providing users with a comprehensive model is a key next step. Conducting rigorous user testing and iterating with feedback provided by members of the visually impaired community is also key. Deeper insights into their challenges are pivotal in refining Spectra’s user experience.