Odyssey Walk — Project Story
Created by Aaron Kleiman, Andrew Heraldo, and Sam Lee.
What inspired us
The team wanted to make exploring a city feel less like following a list and more like having a guide in your ear. Most walking-tour apps are either rigid audio guides or map-only; the idea was to combine generative AI with voice-first interaction so users could:
- Generate a custom route and story from a single starting point and a theme (history, food, spooky, art, etc.).
- Walk with their phone in their pocket and hear narration at each stop, triggered by location.
- Ask questions on the spot—"Why is this building significant?" or "What happened here in 1920?"—and get an answer spoken back, without typing.
The name Odyssey Walk is a nod to the journey: you choose the start, the LLM and maps shape the path, and you discover the story step by step.
What we learned
Orchestrating LLM + maps + voice. Tour generation is a pipeline: user input → OpenRouter (structured JSON for intro, outro, and 5–8 POIs with scripts and facts) → validation and quality checks → Google Directions API (walking route and optimized stop order) → session state. Keeping prompts, response parsing, and fallbacks (e.g., Haversine when Directions fails) in sync was a steep learning curve for the team.
Voice stack tradeoffs. The team learned the difference between browser APIs (
SpeechSynthesis,SpeechRecognition) and the Gradium API for cloud TTS and STT (streaming over WebSockets or POST). Browser voice works everywhere and needs no keys; the Gradium API gives higher-quality, natural-sounding speech and more control over voices and languages. Supporting both—with clear fallbacks and env-driven config—taught them how to design for "works out of the box" vs "best experience with setup."Location and triggers. Making narration fire at the right moment meant implementing a small geo trigger engine: consider only the next (k) unvisited POIs, use an effective radius (r_{\text{eff}} = r_{\text{base}} + \text{clamp}(\text{accuracy}, 0, 30)) (meters), hysteresis (consecutive inside-checks to enter, exit buffer to leave), and a cooldown so the same POI doesn’t re-trigger immediately. Finding the right balance of responsiveness vs. stability took iteration.
Deploying on Vercel. Serverless constraints (no ffmpeg, no persistent filesystem, 10s timeout on Hobby) forced clear boundaries: server-side voice via the Gradium API (STT) is optional; tour generation gets a higher
maxDurationwhere the plan allows; and "Save Tour" is best-effort unless they add a real data store. Building the app to work with browser-only voice and optional Gradium API TTS made deployment straightforward.
How we built it
Stack: Next.js 14 (App Router), TypeScript, Tailwind CSS, Framer Motion, Google Maps (map + Places + Directions), OpenRouter for the LLM, and the Gradium API for optional cloud text-to-speech and speech-to-text (TTS/STT).
Create flow
The user sets a start (search or map tap), picks theme, duration, language, and voice. A single OpenRouter request returns a structured tour (intro, outro, POIs with names, scripts, facts, and rough coordinates). The team validates POI quality (e.g., minimum script length and fact count), then calls Google’s Directions API—first by place queries (best), then by coordinates, then Haversine fallback. Walking time is distributed across segments; they use ~80 m/min as a simple walking-speed model for "time to get there" estimates.Active walk
Session state lives in the client (and optionally inpublic/toursfor saved tours). After the intro plays, the app subscribes to location (real GPS or a demo simulator). The geo trigger engine consumes location updates and emits "arrived at POI" events; the audio manager plays the POI script, then waits for the next trigger or a manual "Ask" action. Press-and-hold mic sends audio (or typed text) to STT → OpenRouter (Q&A) → TTS and plays the answer; narration and Q&A can use either browser APIs or the Gradium API (WebSocket or POST) for higher-quality voice. The UI dims during narration and Q&A so the focus stays on listening.Math and geometry
- Haversine distance between two points (used for fallback routing and trigger radius): [ d = 2R \arcsin\left(\sqrt{\sin^2\left(\frac{\Delta\phi}{2}\right) + \cos\phi_1 \cos\phi_2 \sin^2\left(\frac{\Delta\lambda}{2}\right)}\right), ] with (R) the Earth radius and (\phi,\lambda) in radians.
- Effective trigger radius: (r_{\text{eff}} = r_{\text{base}} + \min(\max(\text{accuracy}, 0), 30)) so noisier GPS doesn’t make triggers too jumpy.
- Walking time: segment distance (meters) divided by a fixed speed (e.g. 80 m/min) for rough "time to get there" and total duration.
Demo mode
If location is denied or the user chooses demo, the app simulates movement along the route so the full flow (intro → POIs → Q&A → outro) is testable without leaving the desk.
Challenges we faced
LLM output shape and quality. Getting consistent JSON (intro, outro, POIs with scripts and coordinates) required careful prompting, retries, and stripping markdown code fences. Some POIs were too short or generic; the team added server-side validation (e.g., minimum word count and fact count) and filter those out before building the route.
Directions API and coordinate quality. The LLM’s lat/lng are approximate. Using place queries (address or "name near start") for the Directions API gave much better routes than raw coordinates. When that failed (e.g., obscure names), falling back to coordinate-based directions and then to Haversine kept the product working instead of failing hard.
Audio and the Gradium API. The Gradium API supports TTS over WebSocket or POST and STT over WebSocket. Handling the WebSocket lifecycle (connect, send text, stream audio chunks, timeout) and surfacing errors in the UI (e.g., "TTS not configured" or network errors) took iteration. On Vercel, server-side STT via the Gradium API would require ffmpeg for webm→PCM; rather than custom runtimes, the team documents leaving Gradium STT unset and using browser SpeechRecognition for deployment.
Geo triggers and UX. Triggering too early (user not quite at the POI) or too late was confusing. Adding hysteresis (must be inside for 2 consecutive updates; exit only when beyond (r_{\text{eff}} + \text{margin})) and a 60s cooldown per POI made behavior stable. Letting users tap "Skip" or replay the current stop kept control in their hands.
Build and deploy. A readonly ref in a callback (MapView) broke the TypeScript build; fixing the ref type and adding
@types/wsfor the Gradium WebSocket client got the build green. Cleaning the Next.js cache resolved a transient "Cannot find module for page" during collect. Documenting env vars and Vercel limits (timeouts, no ffmpeg, no persistent disk) in the README made deployment repeatable.
Odyssey Walk is the result of tying together maps, LLMs, and voice so that a single starting point and a theme can turn into a personalized, walkable story—and users can ask questions along the way without breaking stride.
Built With
- and-directions-apis)-for-routing-and-search
- and-framer-motion-on-vercel;-it-uses-openrouter-for-llm-powered-tour-generation-and-q&a
- google-maps
- google-maps-(maps
- gradium
- next.js
- openrouter
- places
- react-18
- tailwind
- tailwind-css
- the-gradium-api-for-optional-cloud-tts/stt
- typescript
- vercel
- web-speech
Log in or sign up for Devpost to join the conversation.