Voice Notes

Inspiration

Privacy concerns with cloud-based voice assistants and note-taking apps motivated me to explore on-device AI. I wondered: could I build a fully functional voice notes app that never sends data to the cloud? Starting with the whisper.cpp Android example, I set out to prove that powerful AI features like transcription, summarization, and intelligent Q&A could run entirely offline on a smartphone.

What it does

Voice Notes is a privacy-first Android app that transforms spoken words into intelligent, searchable notes using 100% on-device AI:

  • Transcribes speech to text with timestamps using Whisper.cpp
  • Generates summaries of your recordings using a local LLM (Gemma 3 1B)
  • Answers questions about your transcriptions using RAG (Retrieval Augmented Generation)
  • Semantic search across all notes using text embeddings
  • Audio playback with seekable controls and timestamp navigation

Everything runs offline—no internet required, no cloud processing, complete privacy.

How we built it

Tech Stack

  • Kotlin + Jetpack Compose for modern Android UI with Material 3 design
  • whisper.cpp compiled as native library (JNI) for efficient speech recognition
  • Google AI Edge (MediaPipe) for on-device LLM inference (Gemma 3 1B INT4 quantized)
  • ONNX Runtime for text embeddings (all-MiniLM-L6-v2, 384 dimensions)
  • Room Database for local storage with Flow-based reactive updates
  • Kotlin Coroutines for async operations without blocking UI

Key Implementation

The RAG system splits long transcriptions into 1500-character chunks, generates embeddings, and retrieves the top 4 most relevant chunks using cosine similarity before sending context to the LLM. This keeps memory usage under 8000 characters (~2000-2500 tokens) to prevent OOM crashes on mobile devices.

Challenges we ran into

Finding an LLM that works on edge devices: The biggest challenge was identifying an LLM that could run efficiently on older devices like the Galaxy S20 while maintaining reasonable performance. I tested multiple models and configurations, evaluating inference speed, memory consumption, and output quality. Finding the right balance between model capability and device constraints required extensive experimentation. Ultimately, Gemma 3 1B INT4 quantized proved to be the sweet spot—small enough to fit in memory with aggressive chunking, yet powerful enough to generate meaningful summaries and answer questions about transcriptions on resource-constrained hardware.

Accomplishments that we're proud of

  • Fully functional RAG on mobile — Implementing semantic retrieval with embeddings for intelligent Q&A
  • Zero network dependencies — Everything runs offline after initial model download
  • Clean UI/UX — Material 3 design with waveform visualization and smooth animations
  • Production-ready — Handles edge cases, proper error handling, persistent storage
  • Extended whisper.cpp — Transformed a simple example into a complete app with database, LLM, and RAG

What we learned

  • Quantization is essential for mobile — INT4 models make LLMs practical on phones
  • RAG solves context limits — Semantic retrieval is more effective than truncation for long text
  • On-device AI is viable — Modern Android devices can run surprisingly capable AI models
  • JNI memory limits are real — Mobile apps need aggressive memory optimization for large models
  • Jetpack Compose is powerful — Building complex UIs with state management is much cleaner than XML

What's next for Voice Notes

  • Speaker diarization — Identify different speakers in conversations
  • Continuous recording mode — For meetings and lectures with automatic chunking
  • Export/backup — Share notes as text, audio, or combined PDF
  • Smaller models — Experiment with distilled versions for faster inference
  • Multi-language support — Leverage Whisper's multilingual capabilities
  • Voice commands — "Summarize last recording" without touching the screen

Built With

Share this project:

Updates