How we built it
Tech Stack:
Android Frontend (Kotlin + Jetpack Compose)
- Clean Architecture with MVVM pattern
- Hilt for dependency injection
- CameraX for camera preview and capture
- Room + DataStore for offline-first storage
AI & Voice Integration
- ElevenLabs Conversational AI: Natural voice conversations with 4 custom client tools (start_scanner, analyze_document, search_documents, save_document)
- Gemini 2.0 Flash Vision API: Multimodal document analysis (image + text prompts)
- ML Kit Barcode Scanning: QR code generation and scanning for family joining
Backend & Sync
- Firebase Auth: Google Sign-In
- Cloud Firestore: Real-time family document sync
- Firebase Storage: Direct image uploads (following the "golden path"—no intermediary backend server)
Architecture Decisions:
Offline-First: Documents stored locally in Room database, synced to Firestore when online. Works without internet after initial setup.
Voice-First UX: No complex navigation. Users can accomplish everything through voice: scan, search, ask questions—all while camera stays visible.
Direct Gemini Integration: Instead of building a separate backend with Google ADK agents, we call Gemini Vision API directly from client tools. Simpler, faster, and showcases Gemini's multimodal capabilities.
QR-Based Family Sharing: No phone numbers or emails needed. Scan a QR code and instantly join the family's document repository.
Log in or sign up for Devpost to join the conversation.