How we built it

Tech Stack:

Android Frontend (Kotlin + Jetpack Compose)

  • Clean Architecture with MVVM pattern
  • Hilt for dependency injection
  • CameraX for camera preview and capture
  • Room + DataStore for offline-first storage

AI & Voice Integration

  • ElevenLabs Conversational AI: Natural voice conversations with 4 custom client tools (start_scanner, analyze_document, search_documents, save_document)
  • Gemini 2.0 Flash Vision API: Multimodal document analysis (image + text prompts)
  • ML Kit Barcode Scanning: QR code generation and scanning for family joining

Backend & Sync

  • Firebase Auth: Google Sign-In
  • Cloud Firestore: Real-time family document sync
  • Firebase Storage: Direct image uploads (following the "golden path"—no intermediary backend server)

Architecture Decisions:

  1. Offline-First: Documents stored locally in Room database, synced to Firestore when online. Works without internet after initial setup.

  2. Voice-First UX: No complex navigation. Users can accomplish everything through voice: scan, search, ask questions—all while camera stays visible.

  3. Direct Gemini Integration: Instead of building a separate backend with Google ADK agents, we call Gemini Vision API directly from client tools. Simpler, faster, and showcases Gemini's multimodal capabilities.

  4. QR-Based Family Sharing: No phone numbers or emails needed. Scan a QR code and instantly join the family's document repository.

Built With

Share this project:

Updates