EchoGuessr

Inspiration

We love geography games but wanted to flip the concept — what if you couldn't see anything at all? Instead of street-level imagery, what if you had to identify a location purely by listening? That idea became EchoGuessr: a game that tests how well you know the world through sound alone.

What it does

EchoGuessr drops you into a mystery location and gives you up to three AI-generated audio clues:

Ambient sounds — street noise, nature, everyday sounds unique to the region
Regional music — instruments, genres, and rhythms tied to the country
A spoken phrase — a sentence in the local language, spoken by a dynamically generated voice matching the region's accent

After each clue, you can place a pin on the map and lock in your guess. Guessing earlier is harder, but scores up to 3× more points. Scoring uses exponential decay based on distance, so accuracy matters — a lot.

The game also features a score tier system (Perfect, Amazing, Great, Good, Keep Trying), confetti animations on high scores, a global leaderboard, and keyboard shortcuts for power users. A loading screen shows real-time progress as each audio clue is generated.

How we built it

The frontend is Next.js 15 with React 19, Tailwind CSS, and Framer Motion for smooth animations. The map offers two views: a 3D interactive globe built with react-globe.gl (Three.js) and a flat Google Maps view with custom styled markers and polylines connecting your guess to the actual location.

On the backend, location generation is powered by Backboard AI (built on Google Gemini). It picks a random region, generates a location, and writes audio prompts with strict rules — no city names, no landmarks, nothing that gives it away too easily.

Audio is generated in parallel using ElevenLabs: the Sound Generation API creates ambient and music clips, and the Text-to-Speech API generates the spoken phrase. For each round, we don't use a static voice — we dynamically generate a new voice matching the region's accent and language using ElevenLabs' voice generation API, use it for TTS with eleven_multilingual_v2, then automatically delete it afterward.

Challenges we ran into

Getting audio quality right took a lot of prompt tuning — early ambient clips were muddy and music was too generic. We iterated heavily on prompt structure and prompt_influence settings.
The spoken language clue kept being too short (single greetings) or too long (full paragraphs). Finding the sweet spot of one natural sentence took several rounds of prompt refinement.
Next.js dev mode wipes in-memory state on every module reload, which kept destroying active game sessions. We solved this by persisting sessions on globalThis.
Getting custom AdvancedMarker pins to align correctly with Polyline endpoints on Google Maps required reworking the marker anchor points — the default anchor doesn't center on the lat/lng coordinate, which made the result lines connect to the wrong spot.

What we learned

Prompt engineering for audio generation is very different from text — small wording changes dramatically affect the output quality and length.
ElevenLabs' voice generation API can create surprisingly convincing region-specific voices on the fly, but the workflow (preview → save → use → delete) needs careful orchestration.
Exponential decay scoring feels much fairer than linear scoring for geography games — it rewards being close without making distant guesses feel worthless.