-
-
Multi-agent system architecture with 5 AI agents, Cloud Run services, Pub/Sub messaging, and background workers
-
PocketGuide homepage with tour creation button and browsable location cards by category
-
How it works and tour analytics sections
-
Tour creation interface showing 5 AI agents working in real-time with progress indicators
-
Popular tours dashboard showing community favorites with star ratings
-
Tour creation progress showing 5 AI agents with real-time status indicators and completion percentage
-
Personalized walking tour interface featuring interactive map with route markers and embedded audio player
-
Feedback section with 5-star rating, helpful votes, comment box, and emoji reactions
Inspiration
Traditional audio tours cost $30-50, follow fixed schedules, and offer identical content to everyone. PocketGuide generates personalized tours on demand from a curated location database.
Google's Agent Development Kit splits the system into specialized agents: curator, route planner, storyteller, quality controller, and voice synthesizer. Each agent handles a specific task, similar to how tour companies divide responsibilities across teams.
What it does
PocketGuide generates personalized walking tours in under 60 seconds using five specialized agents:
- Tour Curator: Selects locations based on user preferences
- Route Optimizer: Calculates optimal walking paths using Haversine distance
- Storyteller: Generates unique 90-second narratives per location
- Moderator: Validates content quality and appropriateness
- Voice Synthesizer: Creates audio using L4 GPU-accelerated text-to-speech
Additional features:
- Interactive maps with Street View and real-time route visualization
- Category-based search (history, art, food, hidden gems)
- 25 curated Paris locations generate thousands of tour combinations
- Full-screen UI with coral/orange branding
How I built it
Architecture
- Frontend: Next.js 15 (App Router) deployed to Cloud Run
- 5 AI Agents: Built with Google ADK + Gemini 2.5 Flash, deployed as separate Cloud Run services
- Tour Orchestrator: FastAPI service coordinating agent workflow
- Database: Firestore for locations, tours, analytics
- Voice Synthesis: Google Cloud Text-to-Speech API
- Background Jobs: Cloud Run Jobs for analytics aggregation and batch processing
Multi-Agent System
User Request → Tour Orchestrator
↓
[Curator Agent] Firestore → Selects 5-8 locations based on interests
↓
[Optimizer Agent] Haversine → Calculates optimal walking route
↓
[Storyteller Agent] Gemini 2.5 → Generates unique 90-second narratives
↓
[Moderator Agent] Quality Check → Ensures appropriate content
↓
[Voice Agent] L4 GPU → Creates professional audio
↓
Complete Tour (stored in Firestore)
Key Technical Decisions
- Async Generator Pattern: Streaming responses via
async for chunk in agent.run_async(prompt) - Stateless Agents: No InMemoryRunner, no session management
- REST APIs: All agents expose
/invokeendpoints
Deployment Stack
- 9 Cloud Run Services: Frontend + 5 Agents + Orchestrator + 2 Workers
- Total Infrastructure: Fully serverless, auto-scaling, globally distributed
Challenges I ran into
Session Management: ADK's InMemoryRunner caused ValueError: Session not found errors. Removed session management and used direct async generator invocation.
Async Patterns: ADK agents return async generators, not promises. Using await agent.run_async(prompt) threw TypeError. Solution: async for chunk in agent.run_async(prompt).
Error Propagation: Failed agents returned HTML instead of JSON, causing parsing errors downstream. Multi-agent systems mask root causes.
Accomplishments
Multi-Agent Pipeline: Five specialized agents communicate sequentially from location curation to voice synthesis. Completes in under 60 seconds.
Infrastructure: Nine Cloud Run services with error handling, health checks, and auto-scaling. Voice synthesis using Google Cloud Text-to-Speech API with standard Python Docker images.
What I learned
ADK Architecture: Agents return async generators, not promises. Requires async for chunk in agent.run_async() pattern. Stateless invocation more reliable than InMemoryRunner for sequential pipelines.
Cloud Run Deployment: Google Cloud Text-to-Speech API handles voice synthesis. The container uses standard Python images without GPU libraries. Service-to-service auth is automatic within the same project.
Orchestration Patterns: Sequential execution (Curator → Optimizer → Storyteller → Moderator) produces better results than parallel. Failed agents should fail the entire pipeline, not produce partial results.
What's next for PocketGuide
- More Cities: Expand beyond Paris - NYC, Tokyo, London, Istanbul
- Offline Mode: Download tours for travel without internet
- Social Features: Share tours, follow other users, collaborative routes
- Advanced Personalization: ML model learns from user ratings to improve future tours
- Multi-City Tours: "European Art Tour" spanning Paris → Florence → Madrid
- Real-Time Adaptation: Agents adjust tour based on weather, crowd levels, time of day
- Augmented Reality: Point phone at landmark → see AI-generated overlays
Built With
- adk
- firestore
- gemini
- google-maps
- googlecloudrun
- gpu
- nextjs
Log in or sign up for Devpost to join the conversation.