kagaz

kagaz

Comment

How we built it

Tech Stack:

Android Frontend (Kotlin + Jetpack Compose)

Clean Architecture with MVVM pattern
Hilt for dependency injection
CameraX for camera preview and capture
Room + DataStore for offline-first storage

AI & Voice Integration

ElevenLabs Conversational AI: Natural voice conversations with 4 custom client tools (start_scanner, analyze_document, search_documents, save_document)
Gemini 2.0 Flash Vision API: Multimodal document analysis (image + text prompts)
ML Kit Barcode Scanning: QR code generation and scanning for family joining

Backend & Sync

Firebase Auth: Google Sign-In
Cloud Firestore: Real-time family document sync
Firebase Storage: Direct image uploads (following the "golden path"—no intermediary backend server)

Architecture Decisions:

Offline-First: Documents stored locally in Room database, synced to Firestore when online. Works without internet after initial setup.
Voice-First UX: No complex navigation. Users can accomplish everything through voice: scan, search, ask questions—all while camera stays visible.
Direct Gemini Integration: Instead of building a separate backend with Google ADK agents, we call Gemini Vision API directly from client tools. Simpler, faster, and showcases Gemini's multimodal capabilities.
QR-Based Family Sharing: No phone numbers or emails needed. Scan a QR code and instantly join the family's document repository.

Built With

android
elevenlabs
firebase
gemini
google-vision-api
jetcompose
kotlin

Updates

Anil Kumar started this project — Dec 31, 2025 03:35 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.