KLR.ai

Inspiration

It began with curiosity. After reading an article about blind software engineers, one of our teammates blindfolded himself for a day to experience their world. He couldn’t find his keyboard, coffee, or even his phone without help. That brief experiment revealed how many daily interactions depend on vision—and how fragile that independence can feel.

A classmate then shared that her grandfather had lost his eyesight. She’d always wished for a tool that could “describe the room” to him. These experiences converged into the inspiration for KLR, short for Knowledge, Location, Recognition—but also a tribute to Helen Keller, who embodied the power of communication beyond sight.

What it does

How we built it

We built KLR using a hybrid architecture: React Native on the frontend and a Python Flask backend running our computer vision and language models.

Frontend (React Native):

Captures camera frames and stores temporary image paths locally on the device.

Sends those image paths to our backend as lightweight requests, minimizing network payload.

Backend (Flask + Computer Vision):

Flask exposes API endpoints that receive the image path.

For each frame request, we run two core CV models:

MiDaS (depth estimation): generates a distance map so we can say “object ahead, 2 feet.”

YOLO (object detection): draws bounding boxes and identifies objects in real time.

The model outputs are structured into a scene description (object names, confidence, and estimated spatial direction).

Generative AI Layer (Azure LLM):

We pass the structured CV results into Azure GPT-5 Mini to convert raw detection data into natural speech instructions tailored for visually impaired navigation.

Example: Instead of “chair: 84% confidence, depth 1.8m,” the LLM generates:

“There’s a chair two feet ahead slightly to your left.”

Speech Output (ElevenLabs API):

The final LLM-generated instruction string is streamed to ElevenLabs for natural text-to-speech.

The React Native app plays the audio immediately to the user.