EchoHunt

home screen
quest config
in game
end

Inspiration

Traditional scavenger hunts are fun but rigid the same clues work in one space but fail in another. I wanted to create something more dynamic: What if AI could see YOUR environment and generate a treasure hunt just for you? With Gemini 3's powerful vision capabilities, I realized I could build a scavenger hunt that adapts to any room, any space, anywhere in the world.

So just open the link in your mobile browser and enjoy!

What it does

EchoHunt is a mobile-first AR scavenger hunt that runs entirely in your browser. Here's how it works:

Setup: Point your camera around your room (bedroom, office, classroom, etc.)
AI Generation: Gemini 3 analyzes your space and identifies objects, colors, textures, and materials
Personalized Quest: The AI generates 3-10 unique riddles ordered by difficulty—tailored specifically to YOUR environment
Interactive Gameplay: As you search for objects, point your camera at them. Gemini validates in real-time whether you found the right item and provides contextual hints like "Get closer!" or "Too dark!"
Voice Feedback: Browser-native Text-to-Speech provides spoken clues and encouragement

Every quest is 100% unique—no two players will ever have the same experience.

How I built it

Tech Stack:

Frontend: Next.js 16 with React, Tailwind CSS
AI: Gemini 3 Flash (via @google/generative-ai SDK)
APIs: MediaDevices API (camera access), Canvas API (frame capture), Web Speech API (TTS)
Deployment: Vercel (serverless functions)

Key Implementation Details:

Quest Generation (/api/generate-quest): Captures a camera frame, sends it to Gemini 3 with a prompt asking it to identify objects and generate riddles. I used structured JSON output with response schemas to ensure consistent formatting.
Real-time Validation (/api/scan): Every scan sends a frame to Gemini 3 along with the target description. The AI acts as a "referee," determining matches and providing contextual feedback based on visual cues (lighting, distance, angle).
Performance Optimization: Images are compressed to 384px with 0.5 JPEG quality and throttled to prevent excessive API calls (900ms minimum between scans).
AR Overlay: Custom canvas-based animations with animated reticles and confetti celebrations for successful finds.

Challenges I ran into

Camera Permissions on Mobile: Initially struggled with HTTPS requirements for camera access. Learned that mobile browsers block camera on non-HTTPS connections (except localhost).
API Latency: Real-time validation needed to feel instant. I implemented aggressive image compression and throttling to balance responsiveness with API costs.
Prompt Engineering: Getting Gemini to generate riddles that are challenging but solvable required iteration. Too vague ("find something blue") vs too specific ("find the red coffee mug on the left shelf").
Structured Outputs: Ensuring Gemini consistently returned properly formatted JSON using the response schema feature was critical for reliability.

Accomplishments that I'm proud of

Zero App Store Friction: Runs entirely in the browser no downloads, no installs
True Personalization: Unlike template-based games, every quest is genuinely unique to the player's space
Accessibility: Voice feedback makes the game playable hands-free
Production Ready: Clean codebase with error handling, loading states, and mobile optimization
Creative Use of Gemini 3: I'm not just using AI for chat I'm using vision + structured generation + real-time validation to create an entirely new gameplay experience

What I learned

Multimodal AI is Powerful: Gemini 3's ability to understand images and generate contextual text enables experiences that weren't possible before
Structured Outputs are a Game-Changer: Response schemas ensure reliability in production applications
Mobile-First is Essential: Over 80% of users will experience this on phones camera, touch, and performance must be prioritized
Prompt Engineering is an Art: Small changes in prompts dramatically affect AI behavior

What's next for EchoHunt

Graphics & Animations: 3d visual overlays of fun graphics and video animations
Multiplayer Mode: Race against friends in the same space
Themed Quests: Holiday-specific riddles, educational modes for kids (find shapes, colors, letters)
Difficulty Levels: Easy mode for children, expert mode with cryptic clues
Location-Aware Hunts: Outdoor scavenger hunts using GPS + Gemini vision

EchoHunt proves that Gemini 3's vision capabilities can power entirely new categories of interactive experiences. The future of gaming isn't just virtual it's about AI understanding and enhancing our physical world.

Built With

canvas-api
gemini-3-flash-api
google/generative-ai-sdk
lucide-react
mediadevices-api
next.js-16
react
tailwind-css
typescript
vercel
web-speech-api

Updates

Anant Aggarwal started this project — Feb 09, 2026 07:31 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.