Inspiration

Around 43 Million people in the world suffer from blindness and around 2.2 Billion people suffer from visual impairments, aryav being one of them. He can almost be classified as legally blind without his glasses and he knows first and foremost how scary the unknown is. Because of this we decided to create sixth sense so you can see without sight.

What it does

Sixth Sense is a real-time AI vision assistant for the visually impaired. It uses your phone's camera to continuously analyze what's in front of you every 22 seconds and describes it aloud, obstacles, people, text, surroundings, all narrated naturally by an AI voice named Kayla.

How we built it

We built Sixth Sense as a React + Vite PWA deployed on Vercel, with a Node.js/Express backend. The intelligence is powered by claude anthropic vision API, where every 22 seconds it uses the camera on your phone to scan you surroundings and sends it to claude and speaks the response outloud using the webspeech API.

Challenges we ran into

IOS caches. On iphones our Web Speech and hey gen were malfuctioning because ios privacy issues and cache issues proving that our idea was more suitable for app development rather than a website. We spent significant time building layered workarounds, silent audio heartbeats, touchstart listeners, visibility change handlers, and session cache-busting, to make the experience reliable on a real phone.

Accomplishments that we're proud of

We are really proud of how the app turned out. Its really remarkable and useful I was able to use it navigate through the room blindfolded. Additionally, The UI is polished enough that it looks and feels like a real product, not a hackathon project.

What we learned

We learned that challenges come from unexpected angles, We thought the hard part would be the image processing the but the real difficulty came from Hey Gen APi calls and ios cache.

What's next for Sixth Sense

I want to incorporate lidar detection instead of camera, using an iphones lidar detection to detect depth a lot more accurately then a camera. I want to train our LLM so it can identify distances relaitive to sizes better than claude can to increase accuracy. I want to incorporate Wearable integration, pairing with Apple Watch/meta glasses for haptic alerts when obstacles are detected. And lastly an offline mode on-device AI for users without data connections. We both will definitely try and polishing this up and putting this on the app store and even try making a real business.

Built With

  • avatar
  • css-frameworks:-react
  • express-platforms:-vercel-(frontend-deployment)
  • heygen
  • html
  • javascript
  • json
  • languages:-javascript
  • livekit-(avatar-?-removed)
  • node.js
  • railway/render-(backend-deployment)-databases:-mongodb-atlas-apis-&-services:-anthropic-claude-vision-api-(camera-analysis)
  • vercel
  • vite
  • web-speech-api-(voice-recognition-+-text-to-speech)
  • webrtc-(camera-access)-other:-pwa-(progressive-web-app)
Share this project:

Updates