TourGuide: Navigating the Unknown, Together

Inspiration

Moving to a new country — or even just a new city — can feel overwhelming. Street signs in unfamiliar languages, transit systems with unwritten rules, prices that vary wildly depending on whether you "look like a tourist," and social norms that nobody explicitly tells you about.

I built TourGuide for the people who feel like aliens in a new place: immigrants navigating a new home, international students figuring out a campus town, travelers who want to engage authentically rather than get taken advantage of. The theme Explore the Unknown hit close to home — because for millions of people, the "unknown" isn't a hiking trail. It's the grocery store. It's the bus system. It's knowing whether the price on the menu is fair.

The goal was simple: point your camera at any scene and understand it, as if you had a local friend standing next to you.


What It Does

TourGuide is a real-time AI travel companion that combines:

  • Scene Scanner — point your camera at anything and get a plain-language explanation of what you're looking at and how to interact with it
  • Scam Alerts — AI-generated warnings about common tourist traps and how to avoid them, specific to your exact location
  • Price Reality Check — side-by-side local vs. tourist pricing so you know what things actually cost
  • Day Planner — a personalized AI itinerary based on your travel mode (walk/bike/drive), time available, and personal preferences
  • Transit Decoder — real departure times, line numbers, and plain-English instructions for local public transit using live Google Maps data
  • Nearby Essentials — instant access to hospitals, ATMs, pharmacies, and transit stops
  • Turn-by-Turn Navigation — full GPS routing with voice guidance and real-time arrival detection within $r = 50\text{ m}$ of the destination

How We Built It

The stack was chosen to maximize capability on a hackathon timeline:

Layer Tech
Frontend React + Vite + TailwindCSS + Google Maps JS API
Backend Python + FastAPI
Vision AI Gemini 2.5 Flash (multimodal scene analysis)
Location Intel Gemini 2.5 Flash with response_mime_type="application/json"
Voice ElevenLabs TTS
Routing Google Directions API (walking, cycling, driving, transit)
Places Google Places API (nearby search, geocoding)

The core loop is:

$$\text{camera frame} \xrightarrow{\text{Gemini Vision}} \text{scene understanding} \xrightarrow{\text{context + GPS}} \text{actionable guidance}$$

For location intelligence (scams, prices, transit), we use a curated static fast-path for known cities with Gemini as the fallback for anywhere else — keeping response times low while maintaining global coverage.

Navigation uses a step-advance algorithm: the app advances to the next instruction when the user is within $d < 25\text{ m}$ of the next waypoint, and triggers arrival when $d_{\text{dest}} < 50\text{ m}$.


Challenges

The hardest problem wasn't AI — it was trust. Getting Gemini to consistently return valid, structured JSON for location intelligence required enforcing response_mime_type="application/json" and raising token limits to prevent truncated responses. Early versions would silently cut off mid-object.

GPS coordinate validity was a subtle bug that cost hours: typeof NaN === 'number' is true in JavaScript, so coordinates that hadn't resolved yet were passing type guards and sending "NaN" to the backend — causing 500 errors that looked like server crashes but were actually client-side type failures. The fix was adding isFinite() checks.

Mobile access required threading the needle between Vite's host security, ngrok tunnel limitations, and iOS Safari's HTTPS requirements for camera/GPS. We ended up using a Cloudflare tunnel with a Vite proxy to route all API calls through a single HTTPS endpoint.

Making it feel native on iPhone meant respecting env(safe-area-inset-bottom) for the Dynamic Island, handling touch events properly for the camera drag-select interface, and making the app installable as a PWA from Safari.


What We Learned

  • Multimodal AI is genuinely useful when the context is the hard part — not just "what is this object" but "what does this mean for someone who has never been here before"
  • Static curated data + AI fallback is often better than pure AI: faster, cheaper, and more reliable for known cases
  • The gap between "works on localhost" and "works on a phone in the real world" is where most of the real engineering lives

What's Next

TourGuide is built for anyone who has ever felt out of place somewhere new. That's not a niche — that's everyone, at some point. The next step is making it work offline for the moments when you need it most and have the least connectivity.

Built With

Share this project:

Updates