TourGuide: Navigating the Unknown, Together
Inspiration
Moving to a new country — or even just a new city — can feel overwhelming. Street signs in unfamiliar languages, transit systems with unwritten rules, prices that vary wildly depending on whether you "look like a tourist," and social norms that nobody explicitly tells you about.
I built TourGuide for the people who feel like aliens in a new place: immigrants navigating a new home, international students figuring out a campus town, travelers who want to engage authentically rather than get taken advantage of. The theme Explore the Unknown hit close to home — because for millions of people, the "unknown" isn't a hiking trail. It's the grocery store. It's the bus system. It's knowing whether the price on the menu is fair.
The goal was simple: point your camera at any scene and understand it, as if you had a local friend standing next to you.
What It Does
TourGuide is a real-time AI travel companion that combines:
- Scene Scanner — point your camera at anything and get a plain-language explanation of what you're looking at and how to interact with it
- Scam Alerts — AI-generated warnings about common tourist traps and how to avoid them, specific to your exact location
- Price Reality Check — side-by-side local vs. tourist pricing so you know what things actually cost
- Day Planner — a personalized AI itinerary based on your travel mode (walk/bike/drive), time available, and personal preferences
- Transit Decoder — real departure times, line numbers, and plain-English instructions for local public transit using live Google Maps data
- Nearby Essentials — instant access to hospitals, ATMs, pharmacies, and transit stops
- Turn-by-Turn Navigation — full GPS routing with voice guidance and real-time arrival detection within $r = 50\text{ m}$ of the destination
How We Built It
The stack was chosen to maximize capability on a hackathon timeline:
| Layer | Tech |
|---|---|
| Frontend | React + Vite + TailwindCSS + Google Maps JS API |
| Backend | Python + FastAPI |
| Vision AI | Gemini 2.5 Flash (multimodal scene analysis) |
| Location Intel | Gemini 2.5 Flash with response_mime_type="application/json" |
| Voice | ElevenLabs TTS |
| Routing | Google Directions API (walking, cycling, driving, transit) |
| Places | Google Places API (nearby search, geocoding) |
The core loop is:
$$\text{camera frame} \xrightarrow{\text{Gemini Vision}} \text{scene understanding} \xrightarrow{\text{context + GPS}} \text{actionable guidance}$$
For location intelligence (scams, prices, transit), we use a curated static fast-path for known cities with Gemini as the fallback for anywhere else — keeping response times low while maintaining global coverage.
Navigation uses a step-advance algorithm: the app advances to the next instruction when the user is within $d < 25\text{ m}$ of the next waypoint, and triggers arrival when $d_{\text{dest}} < 50\text{ m}$.
Challenges
The hardest problem wasn't AI — it was trust.
Getting Gemini to consistently return valid, structured JSON for location
intelligence required enforcing response_mime_type="application/json" and
raising token limits to prevent truncated responses. Early versions would
silently cut off mid-object.
GPS coordinate validity was a subtle bug that cost hours:
typeof NaN === 'number' is true in JavaScript, so coordinates that
hadn't resolved yet were passing type guards and sending "NaN" to the
backend — causing 500 errors that looked like server crashes but were
actually client-side type failures. The fix was adding isFinite() checks.
Mobile access required threading the needle between Vite's host security, ngrok tunnel limitations, and iOS Safari's HTTPS requirements for camera/GPS. We ended up using a Cloudflare tunnel with a Vite proxy to route all API calls through a single HTTPS endpoint.
Making it feel native on iPhone meant respecting env(safe-area-inset-bottom)
for the Dynamic Island, handling touch events properly for the camera drag-select
interface, and making the app installable as a PWA from Safari.
What We Learned
- Multimodal AI is genuinely useful when the context is the hard part — not just "what is this object" but "what does this mean for someone who has never been here before"
- Static curated data + AI fallback is often better than pure AI: faster, cheaper, and more reliable for known cases
- The gap between "works on localhost" and "works on a phone in the real world" is where most of the real engineering lives
What's Next
TourGuide is built for anyone who has ever felt out of place somewhere new. That's not a niche — that's everyone, at some point. The next step is making it work offline for the moments when you need it most and have the least connectivity.
Built With
- cloudflare
- elevenlabs
- fastapi
- gemini-2.5-flash
- google-directions
- google-maps-javascript-api
- google-places
- python
- react
- tailwindcss
- vite

Log in or sign up for Devpost to join the conversation.