Pocket Guide

Multi-agent system architecture with 5 AI agents, Cloud Run services, Pub/Sub messaging, and background workers
PocketGuide homepage with tour creation button and browsable location cards by category
How it works and tour analytics sections
Tour creation interface showing 5 AI agents working in real-time with progress indicators
Popular tours dashboard showing community favorites with star ratings
Tour creation progress showing 5 AI agents with real-time status indicators and completion percentage
Personalized walking tour interface featuring interactive map with route markers and embedded audio player
Feedback section with 5-star rating, helpful votes, comment box, and emoji reactions

Inspiration

Traditional audio tours cost $30-50, follow fixed schedules, and offer identical content to everyone. PocketGuide generates personalized tours on demand from a curated location database.

Google's Agent Development Kit splits the system into specialized agents: curator, route planner, storyteller, quality controller, and voice synthesizer. Each agent handles a specific task, similar to how tour companies divide responsibilities across teams.

What it does

PocketGuide generates personalized walking tours in under 60 seconds using five specialized agents:

Tour Curator: Selects locations based on user preferences
Route Optimizer: Calculates optimal walking paths using Haversine distance
Storyteller: Generates unique 90-second narratives per location
Moderator: Validates content quality and appropriateness
Voice Synthesizer: Creates audio using L4 GPU-accelerated text-to-speech

Additional features:

Interactive maps with Street View and real-time route visualization
Category-based search (history, art, food, hidden gems)
25 curated Paris locations generate thousands of tour combinations
Full-screen UI with coral/orange branding

How I built it

Architecture

Frontend: Next.js 15 (App Router) deployed to Cloud Run
5 AI Agents: Built with Google ADK + Gemini 2.5 Flash, deployed as separate Cloud Run services
Tour Orchestrator: FastAPI service coordinating agent workflow
Database: Firestore for locations, tours, analytics
Voice Synthesis: Google Cloud Text-to-Speech API
Background Jobs: Cloud Run Jobs for analytics aggregation and batch processing

Multi-Agent System

User Request → Tour Orchestrator
    ↓
[Curator Agent] Firestore → Selects 5-8 locations based on interests
    ↓
[Optimizer Agent] Haversine → Calculates optimal walking route
    ↓
[Storyteller Agent] Gemini 2.5 → Generates unique 90-second narratives
    ↓
[Moderator Agent] Quality Check → Ensures appropriate content
    ↓
[Voice Agent] L4 GPU → Creates professional audio
    ↓
Complete Tour (stored in Firestore)

Key Technical Decisions

Async Generator Pattern: Streaming responses via async for chunk in agent.run_async(prompt)
Stateless Agents: No InMemoryRunner, no session management
REST APIs: All agents expose /invoke endpoints

Deployment Stack

9 Cloud Run Services: Frontend + 5 Agents + Orchestrator + 2 Workers
Total Infrastructure: Fully serverless, auto-scaling, globally distributed

Challenges I ran into

Session Management: ADK's InMemoryRunner caused ValueError: Session not found errors. Removed session management and used direct async generator invocation.

Async Patterns: ADK agents return async generators, not promises. Using await agent.run_async(prompt) threw TypeError. Solution: async for chunk in agent.run_async(prompt).

Error Propagation: Failed agents returned HTML instead of JSON, causing parsing errors downstream. Multi-agent systems mask root causes.

Accomplishments

Multi-Agent Pipeline: Five specialized agents communicate sequentially from location curation to voice synthesis. Completes in under 60 seconds.

Infrastructure: Nine Cloud Run services with error handling, health checks, and auto-scaling. Voice synthesis using Google Cloud Text-to-Speech API with standard Python Docker images.

What I learned

ADK Architecture: Agents return async generators, not promises. Requires async for chunk in agent.run_async() pattern. Stateless invocation more reliable than InMemoryRunner for sequential pipelines. Cloud Run Deployment: Google Cloud Text-to-Speech API handles voice synthesis. The container uses standard Python images without GPU libraries. Service-to-service auth is automatic within the same project.

Orchestration Patterns: Sequential execution (Curator → Optimizer → Storyteller → Moderator) produces better results than parallel. Failed agents should fail the entire pipeline, not produce partial results.

What's next for PocketGuide

More Cities: Expand beyond Paris - NYC, Tokyo, London, Istanbul
Offline Mode: Download tours for travel without internet
Social Features: Share tours, follow other users, collaborative routes
Advanced Personalization: ML model learns from user ratings to improve future tours
Multi-City Tours: "European Art Tour" spanning Paris → Florence → Madrid
Real-Time Adaptation: Agents adjust tour based on weather, crowd levels, time of day
Augmented Reality: Point phone at landmark → see AI-generated overlays

Built With

adk
firestore
gemini
google-maps
googlecloudrun
gpu
nextjs

Updates

Hulya Masharipov started this project — Nov 09, 2025 11:12 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.