Horizon: Travelling Companion

Inspiration

One of our team members' family has an eye defect that has left her with partial blindness. She's described what it feels like: the daunting weight of not being able to just go and the exhaustion of asking for help with things most people don't think twice about. The disorientation of being in an unfamiliar place and not knowing what's around you, whether a step is coming, or which way to turn.

Over 2.2 billion people worldwide live with some form of vision impairment. For blind and low-vision pedestrians, something as simple as walking to the library, recognizing a landmark, or knowing whether a step is ahead requires either a human companion or expensive specialized hardware. What if your phone could be that companion, a warm, conversational presence walking right beside you?

What it does

Horizon is a voice-first AI companion for blind and low-vision people and a spoken audio guide for anyone curious about their surroundings.

Conversational companion. Tap once and just talk. Horizon listens hands-free using silence detection, transcribes your speech with Groq Whisper, and replies in a warm natural voice via ElevenLabs. It remembers your conversation, so follow-ups like "how far is that?" land naturally. In the same call it also decides whether to act on what you said:

  • "What's around me?" → describes your surroundings using the live camera feed and Gemini 2.5 Flash
  • "Take me to the library" → plans a real transit route and speaks the step-by-step directions
  • "Where am I?" → reverse-geocodes your position and tells you the street and which way you're facing

Real-time hazard detection. In assistive mode, Horizon polls the camera at ~1.25 fps and classifies the frame with a Groq vision model (Llama 4 Scout). Steps, drop-offs, low-hanging objects, oncoming bicycles.

On-device YOLO. A local Python server runs YOLOv11s, projecting labelled bounding boxes onto the live camera view for road-relevant objects; people, vehicles, traffic lights, tripping hazards with colour-coded hazard categories.

Fall detection and emergency dispatch. The accelerometer is monitored continuously. A sudden high-magnitude spike followed by stillness triggers a 30-second countdown. If you don't say "I'm okay" or tap dismiss, Horizon reverse-geocodes your position with OSM Nominatim and sends an SMS to your registered emergency contact via Twilio; with a live Google Maps link.

Accessibility-aware transit routing. Routes are planned through a pluggable provider interface backed by the Transit API for Waterloo Region. Constraints like maximum time and budget are evaluated server-side, and assistive mode filters out inaccessible segments.

Voice cloning. The in-app voice picker lets you browse ElevenLabs voices or record a short sample to clone your own or a family member's voice.

How we built it

The stack is a React Native (Expo) mobile client talking to a Node/Express backend, with a sidecar Python ML server for on-device YOLO inference.

[Expo React Native; camera, IMU, mic, haptics]
        │
        ▼
[Node/Express backend -port 3001]
        ├─► Groq Whisper         -speech-to-text
        ├─► Groq Llama 3.3 70B   -companion turns (reply + intent, one call)
        ├─► Groq Llama 4 Scout   -real-time hazard classification
        ├─► Gemini 2.5 Flash      -landmark narration (Groq fallback on quota)
        ├─► ElevenLabs Flash v2.5 -low-latency natural TTS
        ├─► OSM Nominatim         -reverse + forward geocoding (free, no key)
        ├─► Transit API           -Waterloo Region transit routing
        ├─► PostGIS               -spatial POI view-wedge queries
        └─► Twilio                -emergency SMS dispatch

[Python FastAPI -port 8001]
        └─► YOLOv11s (ultralytics) -on-device object detection overlay

The companion is designed around a single Groq call per turn that simultaneously generates the spoken reply and classifies the intent, routing, description, location, assistive toggle, or plain conversation. This keeps latency low and persona consistent, since the same model that decides to plan a route also frames the confirmation in the companion's voice.

The hazard engine runs in a tight loop in assistive mode, posting compressed JPEG frames to the backend every ~800 ms. It fails safe: any inference error or rate-limit returns clear so a quota hit never silently blocks the user. Gemini narration has the same resilience pattern, on a 429 it cools down and falls back to Groq vision rather than returning an error card.

For the YOLO overlay we wrote a FastAPI sidecar (ml/server.py) that loads YOLOv11s once at startup and handles detection requests asynchronously via a thread-pool executor, keeping the main event loop free. The mobile app renders the returned normalized bounding boxes as an SVG overlay on top of the live CameraView.

The database is PostGIS. The ST_DWithin + ST_Azimuth spatial query in getPoisInViewWedge narrows POI lookups to the cone of the user's field of view, so the narration stays relevant to what the camera actually sees rather than pulling every landmark within a radius.

Challenges we ran into

Latency across the full pipeline. Every companion turn chains STT → LLM → TTS. Getting that under two seconds on a real phone over WiFi required choosing the right models (Whisper Turbo, Llama 3.3 70B, ElevenLabs Flash v2.5), keeping prompts tight, and merging the reply + intent into one LLM call rather than two.

Silence detection for hands-free turn-taking. expo-audio metering gives amplitude values, but mapping those to "the user stopped speaking" reliably, without cutting off mid-sentence or waiting too long, required careful threshold tuning and a trailing silence window.

Rate limits on free tiers. Gemini's free quota is generous but hits a wall under demo conditions. We built a cooldown + Groq fallback layer into the narration route so the app keeps working gracefully rather than surfacing a broken state to the user.

Python 3.14 vs ultralytics. The system Python was 3.14, which ultralytics doesn't yet support. We had to detect this, install via the Homebrew Python 3.11 binary, and document it, not something you want to debug the night before a demo.

Monorepo env loading. Expo only reads .env from the directory where npx expo start runs (apps/mobile), while the backend reads from the repo root. The two files need to be kept in sync and are easy to accidentally conflate.

Transit API cold-start routing. Real transit APIs return nested itinerary objects. Mapping those to our flat RawRoute type, handling missing stop names, missing departure times, walk-only legs, and the spoken summary. took more edge-case work than expected.

Accomplishments that we're proud of

  • A fully hands-free, continuous voice conversation loop that feels natural rather than menu-driven, tap once and just talk.
  • Fall detection that actually fires an SMS with a live map link, end to end, on real hardware.
  • The companion reply and intent classification happen in a single LLM call, keeping the persona consistent and the round-trip fast.
  • Gemini narration with a live Groq fallback, the app degrades gracefully under quota pressure rather than breaking.
  • A pluggable routing provider with real Transit API integration and constraint evaluation (time, budget, accessibility).
  • On-device YOLO bounding boxes rendered as a live SVG overlay at usable frame rates on an iPhone over LAN.
  • Voice cloning, you can record a sample and hear directions in your own or your loved ones voice.

What we learned

  • Designing for voice is a different discipline than designing for screen.
  • Single-call LLM design (reply + intent together) is both lower latency and more coherent than chaining two prompts. The model that decides to plan a route also knows how to frame the reply naturally.
  • PostGIS spatial queries are surprisingly good. The view-wedge filter (ST_DWithin + ST_Azimuth) turns a naive "nearby POIs" lookup into something genuinely useful for an audio guide.
  • Free-tier API limits have good resilience architecture, the fallback chain we built for the demo is the right production pattern too.

What's next for Horizon!

  • Moving the hazard model to TFLite/CoreML so it runs locally on the phone.
  • User profiles. Save conversation memory, preferred routes, and favourite POIs to a user account so the companion picks up where it left off across sessions.
  • Proactive. Instead of waiting for the user to ask, have the companion speak up when it detects a landmark entering the field of view.
  • Multiple languages. ElevenLabs and Whisper both support multilingual input/output; the companion persona just needs localized system prompts.
  • Apple Watch / wearable haptics. Offload fall detection and hazard alerts to a wrist device so the phone can stay in a bag or pocket.
Share this project:

Updates