Optica

Optica Logo
Team Optica

🌟Inspiration

There are over 7.2 million people in the U.S. who are legally blind, many of whom rely on others to help them navigate and understand their environment. While technology holds the promise of increased independence, current solutions for the visually impaired often fall short—either lacking accessibility features like text-to-speech or offering overly complex interfaces.

Optica was born out of a desire to bridge this gap. Our app empowers visually impaired individuals by giving them a simple, intuitive tool to perceive the world independently. Through clear, human-like descriptions of their surroundings, Optica provides not just information, but confidence, autonomy, and a deeper connection to their environment.

🛠️ What it does

Optica transforms a smartphone into a tool of empowerment for the visually impaired, enabling users to independently understand their surroundings. With the press of a button, users receive clear, succinct, vivid audio descriptions of what the phone’s camera captures. Optica doesn’t just list objects; it paints a picture—communicating the relationships between objects and creating a true sense of place. Optica enables its users to engage with their environment without outside assistance.

🧱 How we built it

We developed Optica using the ML Kit Object Detection API, which enabled us to identify and classify objects in real-time. These object classifications were then fed into a custom Large Language Model (LLM) powered by TuneStudio and Cerebras, which we trained to generate coherent, natural-language descriptions. The output from this LLM was integrated with Google Cloud’s text-to-speech API to provide users with real-time audio feedback. Throughout development, we maintained a user-first mindset, ensuring that the interface was intuitive and fully accessible.

⚔️ Challenges we ran into

Developing Optica presented numerous technical and logistical challenges, particularly when it came to integrating various cutting-edge technologies. Deploying our object detection model in Android Studio took longer than anticipated, which limited the time we had to refine other components.

Communicating between our computer vision model and TuneStudio’s LLM proved to be complex, requiring us to overcome issues with API integration and SDK compatibility. Additionally, managing the project across GitHub repositories introduced git-related challenges, particularly when merging contributions from different team members.

However, these difficulties only strengthened our resolve and pushed us to learn new skills—especially in debugging, collaboration, and working across frameworks. Mentors played a crucial role in helping us push through these roadblocks, and the experience has made us better engineers and problem solvers!

🎖️ Our Accomplishments

We are incredibly proud of our integration of computer vision and natural language processing, a combination that allows Optica to go beyond standard object recognition! Starting from a basic CV-based idea, we pushed the boundaries by incorporating an LLM to enhance the descriptions and truly serve the visually impaired community. None of us had experience with these APIs and learned so much on this journey!

Our ability to bring together these powerful technologies to create a tool that can have a tangible, positive impact on people’s lives is an accomplishment we hold in high regard. Successfully deploying this onto a user-friendly platform was a milestone we are excited about.

📖 What we learned

Although we might have learned new languages, APIs, and git commands on a technical level, the lessons we've learned go beyond the pages:

Setbacks are an inevitable part of the creative process, and staying adaptable allows you to turn challenges into opportunities!
Starting without all the answers taught us that taking the first step is crucial for personal and project development. We learned to not get ahead of ourselves and take it slow!
Reaching out for help from our mentors showed us the power of collaboration and shared knowledge. We would like to specifically mention Nifaseth and Harsh Deep for their help!

⏭️ What's next for Optica

We plan to continually enhance the app by improving the accuracy and breadth of the image classification model, training it on more diverse datasets that include non-conventional settings and real-world complexity. Additionally, we aim to incorporate advanced depth sensing with Google AR’s depth API to provide even more nuanced scene descriptions. On the accessibility front, we will refine the voice activation and gesture-based navigation to make the app even more intuitive. We also look forward to partnering with organizations and sponsors, like Cerebras and TuneStudio, to ensure that Optica continues to push the boundaries of AI for social good, helping us realize our vision of full independence for the visually impaired.