🛠️ How we built it

Architecture: Multi-agent AI system orchestrated across Google Cloud and ElevenLabs

1. Frontend - Android (Kotlin + Jetpack Compose)

  • Google Maps SDK for real-time map visualization with custom markers and traffic-aware route polylines
  • Maps Compose library for native Compose integration with smooth camera animations
  • CameraX for optional landmark verification (snap and verify you're at the right building)
  • Material Design 3 for clean, distraction-free UI
  • MVVM architecture with Hilt dependency injection for maintainable code
  • Google Maps Routes API: Provides the raw telemetry, polyline data, and complex maneuver details (like U-turns and forks).
  • Google Places API (New v1): Powers our conversational search ("Find a petrol pump near me").

2. Backend - Google Cloud Platform (Temporary hold for MVP & hackathon time constriants)

  • Google Cloud Run hosts our FastAPI server (Python) with auto-scaling for traffic spikes
  • Google ADK (Agent Development Kit) orchestrates our multi-agent system:
    • Coordinator Agent - Routes queries to specialized agents
    • Navigation Agent - Calculates routes using Google Maps Directions API
    • Landmark Agent - Identifies and verifies landmarks using Google Places API
    • Conversation Agent - Manages dialogue context and natural language understanding
    • POI Agent - Finds nearby points of interest on demand
    • Traffic Agent - Monitors real-time traffic via Google Maps APIs
    • Language Agent - Handles multilingual responses

3. AI & Voice - ElevenLabs + Google Vertex AI

  • ElevenLabs Conversational AI Agent provides natural, human-like voice responses
    • Calm, reassuring tone even in stressful situations
    • Handles interruptions gracefully ("Wait, did you say left or right?")
    • Maintains conversation context across the journey
  • Google Vertex AI / Gemini powers:
    • Intent recognition from user queries
    • Context understanding (location, navigation state, user stress level)
    • Response generation with local cultural awareness
    • Route optimization based on real-time conditions

4. Real-time Communication

  • WebSockets for bidirectional communication between app and backend
  • Location updates every 3 seconds with intelligent throttling
  • Geofencing (Google Geofencing API) triggers instructions when approaching turns

5. Data Layer

  • Firestore stores:
    • Navigation sessions and history
    • Conversation logs for improvement
    • User preferences (language, voice speed)
    • Navigation events for analytics
  • Room Database on Android for offline caching

6. Location Services

  • FusedLocationProvider for battery-efficient GPS tracking
  • Foreground Service with persistent notification for background navigation
  • Smart priority switching: High-accuracy during navigation, balanced power when backgrounded
  • Wake locks and Doze mode exemptions for reliability

7. Performance Optimizations

  • Polyline simplification using Ramer-Douglas-Peucker algorithm
  • Marker bitmap caching to prevent memory leaks
  • Traffic segment coloring (green/orange/red) for visual congestion awareness
  • Throttled location updates for smooth rendering without battery drain

Tech Stack Summary:

  • Android: Kotlin, Jetpack Compose, Hilt, Coroutines, Flow, Room
  • Backend: Python, FastAPI, Pydantic, Google ADK
  • AI: ElevenLabs Conversational AI, Google Vertex AI, Gemini
  • Google Cloud: Cloud Run, Firestore, Agent Engine, Secret Manager
  • Maps: Google Maps SDK, Directions API, Places API, Distance Matrix API, Geofencing API
  • Package Manager: UV for Python dependency management

Challenges we ran into

  • The "Double Echo" Loop: Getting an AI to listen while speaking is hard. The microphone often picked up the agent's own voice, causing it to interrupt itself endlessly. We solved this by implementing strict Audio Manager modes (Communication Mode) on Android to trigger hardware acoustic echo cancellation and tuning the VAD (Voice Activity Detection) thresholds in the ElevenLabs dashboard.
  • Visualizing the Road: Rendering complex road structures (like a flyover with a slip road) dynamically was difficult. We wrote custom Canvas drawing logic that translates Google Routes maneuver data into 3D-tilted visual cards, so a "Fork Left" looks exactly like the road ahead.
  • State Management: Coordinating a Singleton Voice Service with transient UI screens led to crashes. We implemented a robust Event Bus architecture to decouple the Voice Agent from the ViewModels, preventing memory leaks while keeping the conversation continuous during screen transitions.

Built With

Share this project:

Updates