Inspiration
I built Echo AI because I wanted a transcription app that works reliably without sending audio to the cloud — preserving privacy, latency, and offline availability. Existing mobile apps either required network access or offered poor on-device performance; Whisper’s model family inspired me to port a practical, efficient version to Android using TensorFlow Lite so users can transcribe anywhere.
What it does
Echo AI is an Android app that:
Runs Whisper-style speech-to-text fully on-device using TensorFlow Lite (TFLite), enabling offline transcription.
Records audio (WAV), transcribes in real-time or in batch, and post-processes text with context-aware corrections (e.g., homonym disambiguation).
Exports transcriptions to multiple formats: PDF, DOCX, and Markdown; supports sharing and clipboard copy.
How I built it
- Platform & language: Android app written in Java (project under whisper_java/).
- Inference engine: Whisper model converted/packaged as .tflite files; inference runs via a TFLite interpreter implemented in the app (see WhisperEngineJava / WhisperEngine).
- Audio pipeline: Recorder/Player components capture and save WAV audio files, which are fed into the TFLite engine for processing (Recorder.java, Player.java, WaveUtil.java).
- Post-processing: A TranscriptionPostProcessor handles homonym corrections, readability improvements, and context-based fixes.
- UI: Material Design 3 layouts with dark mode, live recording UI, and export dialogs.
Challenges I ran into
- Model size vs. device constraints: Whisper models are large; shrinking/quantizing for mobile while keeping usable accuracy required careful TFLite conversions and tuning.
- Latency and battery: Balancing inference speed and power consumption on mid-range devices required profiling and optimizing the TFLite runtime and audio I/O.
- Context-aware corrections: Implementing reliable homonym disambiguation (e.g., their/there) required building heuristics and lightweight language context checks without a cloud LM.
- Multi-format export: Generating clean DOCX and PDF files from raw transcription while preserving formatting and timestamps took iteration across libraries and Android file APIs.
Accomplishments that I'm proud of
- Successfully running Whisper-style transcription fully on-device with multi-format export (PDF/DOCX/MD).
- Implementing a usable post-processing pipeline that meaningfully improves readability and corrects common transcription errors.
- A polished Material Design 3 UI with live recording, share flows, and dark mode.
What I learned
- Practical techniques for converting and running large transformer models on mobile (TFLite quantization, memory footprint reduction).
- How to integrate an audio capture/playback stack with an ML inference pipeline in a real Android app.
- UX considerations for real-time transcription (feedback on recording, progress, and exports).
- Basic document generation/export workflows on Android (creating PDFs and DOCX programmatically).
What's next for EchoAI
- Add smaller / larger model options selectable by the user (trade accuracy for speed).
- Improve multilingual support with language autodetection and language-specific vocab tuning.
- Add an optional (privacy-first) on-device tiny LM for better context-aware corrections.
- Support timestamps and speaker diarization for multi-speaker recordings.
- Optimize model loading (lazy load, streaming inference) to further reduce memory spikes.
Log in or sign up for Devpost to join the conversation.