Inspiration

Traditional scavenger hunts are fun but rigid the same clues work in one space but fail in another. I wanted to create something more dynamic: What if AI could see YOUR environment and generate a treasure hunt just for you? With Gemini 3's powerful vision capabilities, I realized I could build a scavenger hunt that adapts to any room, any space, anywhere in the world.

So just open the link in your mobile browser and enjoy!

What it does

EchoHunt is a mobile-first AR scavenger hunt that runs entirely in your browser. Here's how it works:

  1. Setup: Point your camera around your room (bedroom, office, classroom, etc.)
  2. AI Generation: Gemini 3 analyzes your space and identifies objects, colors, textures, and materials
  3. Personalized Quest: The AI generates 3-10 unique riddles ordered by difficulty—tailored specifically to YOUR environment
  4. Interactive Gameplay: As you search for objects, point your camera at them. Gemini validates in real-time whether you found the right item and provides contextual hints like "Get closer!" or "Too dark!"
  5. Voice Feedback: Browser-native Text-to-Speech provides spoken clues and encouragement

Every quest is 100% unique—no two players will ever have the same experience.

How I built it

Tech Stack:

  • Frontend: Next.js 16 with React, Tailwind CSS
  • AI: Gemini 3 Flash (via @google/generative-ai SDK)
  • APIs: MediaDevices API (camera access), Canvas API (frame capture), Web Speech API (TTS)
  • Deployment: Vercel (serverless functions)

Key Implementation Details:

  • Quest Generation (/api/generate-quest): Captures a camera frame, sends it to Gemini 3 with a prompt asking it to identify objects and generate riddles. I used structured JSON output with response schemas to ensure consistent formatting.
  • Real-time Validation (/api/scan): Every scan sends a frame to Gemini 3 along with the target description. The AI acts as a "referee," determining matches and providing contextual feedback based on visual cues (lighting, distance, angle).
  • Performance Optimization: Images are compressed to 384px with 0.5 JPEG quality and throttled to prevent excessive API calls (900ms minimum between scans).
  • AR Overlay: Custom canvas-based animations with animated reticles and confetti celebrations for successful finds.

Challenges I ran into

  1. Camera Permissions on Mobile: Initially struggled with HTTPS requirements for camera access. Learned that mobile browsers block camera on non-HTTPS connections (except localhost).
  2. API Latency: Real-time validation needed to feel instant. I implemented aggressive image compression and throttling to balance responsiveness with API costs.
  3. Prompt Engineering: Getting Gemini to generate riddles that are challenging but solvable required iteration. Too vague ("find something blue") vs too specific ("find the red coffee mug on the left shelf").
  4. Structured Outputs: Ensuring Gemini consistently returned properly formatted JSON using the response schema feature was critical for reliability.

Accomplishments that I'm proud of

  • Zero App Store Friction: Runs entirely in the browser no downloads, no installs
  • True Personalization: Unlike template-based games, every quest is genuinely unique to the player's space
  • Accessibility: Voice feedback makes the game playable hands-free
  • Production Ready: Clean codebase with error handling, loading states, and mobile optimization
  • Creative Use of Gemini 3: I'm not just using AI for chat I'm using vision + structured generation + real-time validation to create an entirely new gameplay experience

What I learned

  • Multimodal AI is Powerful: Gemini 3's ability to understand images and generate contextual text enables experiences that weren't possible before
  • Structured Outputs are a Game-Changer: Response schemas ensure reliability in production applications
  • Mobile-First is Essential: Over 80% of users will experience this on phones camera, touch, and performance must be prioritized
  • Prompt Engineering is an Art: Small changes in prompts dramatically affect AI behavior

What's next for EchoHunt

  • Graphics & Animations: 3d visual overlays of fun graphics and video animations
  • Multiplayer Mode: Race against friends in the same space
  • Themed Quests: Holiday-specific riddles, educational modes for kids (find shapes, colors, letters)
  • Difficulty Levels: Easy mode for children, expert mode with cryptic clues
  • Location-Aware Hunts: Outdoor scavenger hunts using GPS + Gemini vision

EchoHunt proves that Gemini 3's vision capabilities can power entirely new categories of interactive experiences. The future of gaming isn't just virtual it's about AI understanding and enhancing our physical world.

Built With

  • canvas-api
  • gemini-3-flash-api
  • google/generative-ai-sdk
  • lucide-react
  • mediadevices-api
  • next.js-16
  • react
  • tailwind-css
  • typescript
  • vercel
  • web-speech-api
Share this project:

Updates