StepGuide

Inspiration

One of our team members has a visually impaired family member, so we’ve seen firsthand the challenges and loneliness they experience. This has inspired us to create a solution that could help improve mobility, safety, and independence.

What it does

StepGuide empowers visually impaired users by enhancing mobility while keeping safety a top priority. It captures visual information, detects text, and reads it aloud, helping users navigate and understand their surroundings more independently.

How we built it

We began with a planning phase to outline the idea and break it down into components. We developed the OCR and text-to-speech (TTS) systems separately using EasyOCR and Coqui TTS. After testing each module individually, we integrated them into a single pipeline. We then tested the system on various computers to ensure compatibility and stability.

Challenges we ran into

We encountered a persistent boot error that halted our progress until we discovered a fix by short-circuiting specific pins—a risky but necessary workaround. Team coordination was also a challenge, especially under time pressure and parallel workflows. We had to quickly adapt and improve our communication to stay productive. Our webcam also unexpectedly stopped working during our testing phase, leading to us having to hopefully outsource another camera from the camera crew for our demo (we will use our computer cameras instead if we can't get a camera in time).

Accomplishments that we're proud of

A working text-to-speech system that reliably reads out detected text. A functioning OCR model that can semi-accurately detect text in images. We successfully built and tested a working prototype with minimal errors, which we’re very proud of.

What we learned

We gained hands-on experience with libraries like EasyOCR, Coqui TTS, PyTorch, and more. We also learned about multithreading, model training, and the importance of well-structured teamwork under pressure.

What's next for StepGuide

Our next steps include: Connecting StepGuide to a large language model (LLM) for smart context understanding Integrating with the Google Maps API for real-time navigation Embedding the system into smart glasses to create a wearable and discreet solution for everyday use