Inspiration

Traditional audio tours cost $30-50, follow fixed schedules, and offer identical content to everyone. PocketGuide generates personalized tours on demand from a curated location database.

Google's Agent Development Kit splits the system into specialized agents: curator, route planner, storyteller, quality controller, and voice synthesizer. Each agent handles a specific task, similar to how tour companies divide responsibilities across teams.

What it does

PocketGuide generates personalized walking tours in under 60 seconds using five specialized agents:

  • Tour Curator: Selects locations based on user preferences
  • Route Optimizer: Calculates optimal walking paths using Haversine distance
  • Storyteller: Generates unique 90-second narratives per location
  • Moderator: Validates content quality and appropriateness
  • Voice Synthesizer: Creates audio using L4 GPU-accelerated text-to-speech

Additional features:

  • Interactive maps with Street View and real-time route visualization
  • Category-based search (history, art, food, hidden gems)
  • 25 curated Paris locations generate thousands of tour combinations
  • Full-screen UI with coral/orange branding

How I built it

Architecture

  • Frontend: Next.js 15 (App Router) deployed to Cloud Run
  • 5 AI Agents: Built with Google ADK + Gemini 2.5 Flash, deployed as separate Cloud Run services
  • Tour Orchestrator: FastAPI service coordinating agent workflow
  • Database: Firestore for locations, tours, analytics
  • Voice Synthesis: Google Cloud Text-to-Speech API
  • Background Jobs: Cloud Run Jobs for analytics aggregation and batch processing

Multi-Agent System

User Request → Tour Orchestrator
    ↓
[Curator Agent] Firestore → Selects 5-8 locations based on interests
    ↓
[Optimizer Agent] Haversine → Calculates optimal walking route
    ↓
[Storyteller Agent] Gemini 2.5 → Generates unique 90-second narratives
    ↓
[Moderator Agent] Quality Check → Ensures appropriate content
    ↓
[Voice Agent] L4 GPU → Creates professional audio
    ↓
Complete Tour (stored in Firestore)

Key Technical Decisions

  1. Async Generator Pattern: Streaming responses via async for chunk in agent.run_async(prompt)
  2. Stateless Agents: No InMemoryRunner, no session management
  3. REST APIs: All agents expose /invoke endpoints

Deployment Stack

  • 9 Cloud Run Services: Frontend + 5 Agents + Orchestrator + 2 Workers
  • Total Infrastructure: Fully serverless, auto-scaling, globally distributed

Challenges I ran into

Session Management: ADK's InMemoryRunner caused ValueError: Session not found errors. Removed session management and used direct async generator invocation.

Async Patterns: ADK agents return async generators, not promises. Using await agent.run_async(prompt) threw TypeError. Solution: async for chunk in agent.run_async(prompt).

Error Propagation: Failed agents returned HTML instead of JSON, causing parsing errors downstream. Multi-agent systems mask root causes.

Accomplishments

Multi-Agent Pipeline: Five specialized agents communicate sequentially from location curation to voice synthesis. Completes in under 60 seconds.

Infrastructure: Nine Cloud Run services with error handling, health checks, and auto-scaling. Voice synthesis using Google Cloud Text-to-Speech API with standard Python Docker images.

What I learned

ADK Architecture: Agents return async generators, not promises. Requires async for chunk in agent.run_async() pattern. Stateless invocation more reliable than InMemoryRunner for sequential pipelines. Cloud Run Deployment: Google Cloud Text-to-Speech API handles voice synthesis. The container uses standard Python images without GPU libraries. Service-to-service auth is automatic within the same project.

Orchestration Patterns: Sequential execution (Curator → Optimizer → Storyteller → Moderator) produces better results than parallel. Failed agents should fail the entire pipeline, not produce partial results.

What's next for PocketGuide

  • More Cities: Expand beyond Paris - NYC, Tokyo, London, Istanbul
  • Offline Mode: Download tours for travel without internet
  • Social Features: Share tours, follow other users, collaborative routes
  • Advanced Personalization: ML model learns from user ratings to improve future tours
  • Multi-City Tours: "European Art Tour" spanning Paris → Florence → Madrid
  • Real-Time Adaptation: Agents adjust tour based on weather, crowd levels, time of day
  • Augmented Reality: Point phone at landmark → see AI-generated overlays

Built With

Share this project:

Updates