VisionGuide

Inspiration

Navigating the world without sight is incredibly challenging, especially in unfamiliar environments. We wanted to build a tool that gives visually impaired individuals more independence, safety, and confidence using affordable hardware and real-time AI. This project prioritizes inclusivity, ease of use, and accessibility, ensuring that users especially underserved or diverse populations—benefit from a thoughtful, human-first approach!

What it does

VisionGuide captures a live image using an ESP32-CAM, sends it to our FastAPI server for AI interpretation, and instantly generates a spoken description on the user’s phone. It identifies objects, people, and obstacles, allowing users to understand their surroundings without relying on GPS or internet.

How we built it

We used an Arduino + ESP32-CAM to captured real-time images and transmit them over local WiFi. The WiFi Module enabled the ESP32 to communicate with our server on the same network. Additionally, we used a FastAPI Backend, which received images, processed them with AI, and returned a human-readable description. An AI Vision Model interpreted scenes and identified objects in the user's path, and a React frontend displayed results and provided a user interface for audio feedback. We also used Cocos2d which assisted with visual prototyping and UI animations for demo interactions.

Challenges we ran into

Some challenges we ran into were connecting the ESP32-CAM reliably to local WiFi and avoiding disconnects. Also, handling image encoding/decoding between Arduino and the server was a challenge. Integrating FastAPI with the AI model while keeping speed fast enough for real-time responses, and synchronizing the phone’s TTS (text-to-speech) system with backend updates was something we struggled with as well.

Accomplishments that we're proud of

We built a fully working real-time visual assistance tool in under 24 hours! Also, we were successfully able to stream ESP32-CAM images, and also videos. Also, achieved smooth audio descriptions on the phone with minimal delay, and created a solution that is affordable, portable, and locally deployable.

What we learned

We learned a lot in the hackathon, such as how to integrate hardware (ESP32) with modern software stacks (FastAPI, React). Also, we learned the best practices for streaming image data efficiently, and the techniques for low-latency AI inference on limited hardware. Finally, we realized how accessible design requires simplicity, reliability, and user empathy.

What's next for VisionGuide

We would love to add continuous object tracking and obstacle distance estimation, and expand TTS options and adding multilingual support. We also want to improve the hardware with a 3D-printed enclosure and haptic feedback so it can be smaller and more compact/user friendly. We would like to train a custom model optimized for blind navigation (curbs, steps, doorways), and turn VisionGuide into a fully wearable accessory for everyday use (such as a necklace, on a walking stick, or on glasses).