Inspiration
Accessibility technology hasn't kept pace with AI. We kept asking ourselves: if large vision models can describe an image in seconds, why are visually impaired people still relying on tools that just beep? We wanted to build something that treated users as intelligent people who deserve rich, contextual information — not just a warning tone.
What it does
ElSEE is a hands-free, voice-driven mobile companion for the visually impaired. It uses your phone's camera to continuously monitor the environment, detect hazards, and answer natural language questions about the world around you — all spoken back in realistic, low-latency audio. No buttons, no screen, no learning curve.
How we built it
We built the frontend in React Native with Expo for cross-platform accessibility. The vision and reasoning layer runs on Gemini 2.5 Flash, which processes live camera frames and interprets context. ElevenLabs handles text-to-speech for natural-sounding responses. The backend is Python (FastAPI) with MongoDB for storing interaction history, and Auth0 for secure, frictionless authentication.
Challenges we ran into
Our original vision was to run ElSEE on a Jetson Nano — a dedicated edge device that would make the whole system self-contained and wearable. Unfortunately, we couldn't get our hands on one. We pivoted to a Raspberry Pi, but ran into another wall: we didn't have the right hardware to properly interface it with our laptop for development and testing. After burning through more time than we'd like to admit on hardware logistics, we made the pragmatic call to go fully software and build it as a mobile app instead. Honestly, the constraints pushed us toward a solution that's arguably more accessible — because almost everyone already has a smartphone in their pocket.
Accomplishments that we're proud of
Getting the end-to-end voice loop working smoothly felt like a genuine breakthrough. When you ask "what's in front of me?" and hear a calm, natural response describing your environment in under two seconds — it just works. We're also proud of how clean the architecture turned out given the time constraints, and that the app is genuinely usable, not just a demo.
What we learned
Building for accessibility forced us to be better engineers. When you can't rely on visual feedback, every response has to be precise, concise, and timed correctly. We also learned a lot about prompt engineering for vision models — getting Gemini to describe a scene the way a person would describe it to a friend, rather than a list of detected objects, took real iteration.
What's next for ElSEE — Electronic Sight & Empowering Eyes
We want to move beyond obstacle detection into true environmental intelligence — recognizing familiar faces, reading signs and menus aloud, and integrating step-by-step indoor navigation. Longer term, we're eyeing a wearable form factor: smart glasses that make ElSEE completely invisible to the outside world, so users can move through life with confidence and without stigma.
Built With
- auth0
- elevenlabs
- expo.io
- fastapi
- gemini
- genai
- mongodb
- opencv
- python
- react
- react-native
- typescript
Log in or sign up for Devpost to join the conversation.