Inspiration

I had an uncle growing up who was blind and a nature lover. He would often take long walks with family members, and ask them to describe, in detail, what they saw around him. He could not stop smiling as the details got richer and more nuanced. The joy on his face during these interactions inspired me to ask: What if AI could provide this same experience to visually impaired individuals worldwide, anytime, anywhere?

PERCV is my attempt to bring this joy to visually impaired individuals across the globe!

What it does

PERCV is an AI powered platform that helps visually impaired individuals understand their surroundings. Users are connected with Persy, a conversational AI assistant who can see, describe, and answer questions about a user's surroundings. Unlike most existing tools that provide static descriptions, Persy engages in a dynamic, back-and-forth conversations about surroundings, creating a more human-like assistance experience.

PERCV helps users:

  • Get instant, detailed descriptions of their surroundings through natural conversation
  • Ask follow-up questions about specific objects, people, or environmental details
  • Navigate spaces confidently with contextual information about their environment
  • Experience seamless multilingual support that adapts to user preferences
  • Access everything through voice with full screen reader compatibility

PERCV will give the visually impaired more independence by transforming daily activities like grocery shopping, exploring new places, or walking into an unfamiliar environment into an empowering experience!

How we built it

PERCV was built with a focus on empowerment and independence. PERCV's design philosophy prioritizes versatility and ease of use for the visually impaired. Through every stage of the design process, I focused on keeping the app simple and accessible so that it can serve as a seamless value-add to any visually impaired individual's day to day life. As a result, you will see the PERCV has a single input interface which uses either double taps (to start and stop the service) or single taps (to continue the conversation).

I built PERCV using cutting edge AI technologies like OpenAI's Whisper for automatic speech recognition (ASR) and transcription, OpenAI's GPT-4 for translation and reverse geocoding, and ElevenLab's cutting edge Turbo v2.5 text to speech (TTS) for speech synthesis and multi-language support. I also made use of Supabase Edge Functions for managing the AI workflow, Netlify for deployments, and IONOS Entri for the domain (https://percv.org).

Challenges we ran into

PERCV was an ambitious project from the start, unifying many different AI technologies and workflows. Structuring prompts so that the conversational flow felt natural and conversational was an ongoing trial-and-error challenge. Tweaking the level of detail captured from the camera was another area which took multiple iterations to get right. Balancing latency, performance, and AI-efficiency required playing with different models (eleven_turbo_v2, eleven_turbo_v3, etc.) and testing them in a variety of situations.

Due to the high cost of API calls to ElevenLabs and OpenAI Whisper, I put the main application behind a beta-code wall. However, one of the judging criteria now requires that the URL provided for judging will automatically populate any beta codes. I was able to accommodate this request, but now judges will completely bypass the landing page experience unless they visit 'percv.org' first and then input a beta code. To mitigate this, I've added screenshots of the landing page to the image gallery for my submission.

Accomplishments that we're proud of

  • PERCV is fully WCAG2.2 compliant, screen-reader friendly, and receives a 100/100 (!!) score in Chrome Lighthouse for accessibility, further cement its commitment to universal accessibility
  • The conversational AI can switch languages if the user requests it. For example, after describing a scene in English, if a user asks "Puedes hablar en espanol?" the AI detects this and re-describes the scene in Spanish. Additionally, using the user's browser settings (navigator.languages and navigator.language), the model can infer the best language to use.
  • The entire application, start to finish, was planned and built within Bolt!

What we learned

  • AI really can revolutionize the world of accessibility. It is REMARKABLE how this technology scales.
  • The art of prompting cannot be understated, both within Bolt and when querying LLMs or other AI models. Just like figuring out the most effective way to speak with a friend, it requires patience and persistence.

What's next for PERCV

  • Build a native iOS and Android version of this tool using Expo or something similar.
  • Explore partnerships with non-profits like the National Federation of the Blind and Hadley Institute for the Blind to make PERCV a free-to-use service for visually impaired individuals across the world, especially in areas where other services like BeMyEyes or Google Lookout are currently unavailable.
  • Expand the functionality set to assist hearing-impaired, deaf-blind, and / or mobility restricted individuals.
  • Allow family members and caregivers to create accounts for loved ones for emergency assistance.
  • Add customization and more voice options from ElevenLabs.

Built With

Share this project:

Updates