Voice Notes
Inspiration
Privacy concerns with cloud-based voice assistants and note-taking apps motivated me to explore on-device AI. I wondered: could I build a fully functional voice notes app that never sends data to the cloud? Starting with the whisper.cpp Android example, I set out to prove that powerful AI features like transcription, summarization, and intelligent Q&A could run entirely offline on a smartphone.
What it does
Voice Notes is a privacy-first Android app that transforms spoken words into intelligent, searchable notes using 100% on-device AI:
- Transcribes speech to text with timestamps using Whisper.cpp
- Generates summaries of your recordings using a local LLM (Gemma 3 1B)
- Answers questions about your transcriptions using RAG (Retrieval Augmented Generation)
- Semantic search across all notes using text embeddings
- Audio playback with seekable controls and timestamp navigation
Everything runs offline—no internet required, no cloud processing, complete privacy.
How we built it
Tech Stack
- Kotlin + Jetpack Compose for modern Android UI with Material 3 design
- whisper.cpp compiled as native library (JNI) for efficient speech recognition
- Google AI Edge (MediaPipe) for on-device LLM inference (Gemma 3 1B INT4 quantized)
- ONNX Runtime for text embeddings (all-MiniLM-L6-v2, 384 dimensions)
- Room Database for local storage with Flow-based reactive updates
- Kotlin Coroutines for async operations without blocking UI
Key Implementation
The RAG system splits long transcriptions into 1500-character chunks, generates embeddings, and retrieves the top 4 most relevant chunks using cosine similarity before sending context to the LLM. This keeps memory usage under 8000 characters (~2000-2500 tokens) to prevent OOM crashes on mobile devices.
Challenges we ran into
Finding an LLM that works on edge devices: The biggest challenge was identifying an LLM that could run efficiently on older devices like the Galaxy S20 while maintaining reasonable performance. I tested multiple models and configurations, evaluating inference speed, memory consumption, and output quality. Finding the right balance between model capability and device constraints required extensive experimentation. Ultimately, Gemma 3 1B INT4 quantized proved to be the sweet spot—small enough to fit in memory with aggressive chunking, yet powerful enough to generate meaningful summaries and answer questions about transcriptions on resource-constrained hardware.
Accomplishments that we're proud of
- Fully functional RAG on mobile — Implementing semantic retrieval with embeddings for intelligent Q&A
- Zero network dependencies — Everything runs offline after initial model download
- Clean UI/UX — Material 3 design with waveform visualization and smooth animations
- Production-ready — Handles edge cases, proper error handling, persistent storage
- Extended whisper.cpp — Transformed a simple example into a complete app with database, LLM, and RAG
What we learned
- Quantization is essential for mobile — INT4 models make LLMs practical on phones
- RAG solves context limits — Semantic retrieval is more effective than truncation for long text
- On-device AI is viable — Modern Android devices can run surprisingly capable AI models
- JNI memory limits are real — Mobile apps need aggressive memory optimization for large models
- Jetpack Compose is powerful — Building complex UIs with state management is much cleaner than XML
What's next for Voice Notes
- Speaker diarization — Identify different speakers in conversations
- Continuous recording mode — For meetings and lectures with automatic chunking
- Export/backup — Share notes as text, audio, or combined PDF
- Smaller models — Experiment with distilled versions for faster inference
- Multi-language support — Leverage Whisper's multilingual capabilities
- Voice commands — "Summarize last recording" without touching the screen
Built With
- android-studio
- gemma
- kotlin
- room
- whisper
Log in or sign up for Devpost to join the conversation.