EchoAI

Inspiration

I built Echo AI because I wanted a transcription app that works reliably without sending audio to the cloud — preserving privacy, latency, and offline availability. Existing mobile apps either required network access or offered poor on-device performance; Whisper’s model family inspired me to port a practical, efficient version to Android using TensorFlow Lite so users can transcribe anywhere.

What it does

Echo AI is an Android app that:

Runs Whisper-style speech-to-text fully on-device using TensorFlow Lite (TFLite), enabling offline transcription.
Records audio (WAV), transcribes in real-time or in batch, and post-processes text with context-aware corrections (e.g., homonym disambiguation).
Exports transcriptions to multiple formats: PDF, DOCX, and Markdown; supports sharing and clipboard copy.

How I built it

Platform & language: Android app written in Java (project under whisper_java/).
Inference engine: Whisper model converted/packaged as .tflite files; inference runs via a TFLite interpreter implemented in the app (see WhisperEngineJava / WhisperEngine).
Audio pipeline: Recorder/Player components capture and save WAV audio files, which are fed into the TFLite engine for processing (Recorder.java, Player.java, WaveUtil.java).
Post-processing: A TranscriptionPostProcessor handles homonym corrections, readability improvements, and context-based fixes.
UI: Material Design 3 layouts with dark mode, live recording UI, and export dialogs.

Challenges I ran into

Model size vs. device constraints: Whisper models are large; shrinking/quantizing for mobile while keeping usable accuracy required careful TFLite conversions and tuning.
Latency and battery: Balancing inference speed and power consumption on mid-range devices required profiling and optimizing the TFLite runtime and audio I/O.
Context-aware corrections: Implementing reliable homonym disambiguation (e.g., their/there) required building heuristics and lightweight language context checks without a cloud LM.
Multi-format export: Generating clean DOCX and PDF files from raw transcription while preserving formatting and timestamps took iteration across libraries and Android file APIs.

Accomplishments that I'm proud of

Successfully running Whisper-style transcription fully on-device with multi-format export (PDF/DOCX/MD).
Implementing a usable post-processing pipeline that meaningfully improves readability and corrects common transcription errors.
A polished Material Design 3 UI with live recording, share flows, and dark mode.

What I learned

Practical techniques for converting and running large transformer models on mobile (TFLite quantization, memory footprint reduction).
How to integrate an audio capture/playback stack with an ML inference pipeline in a real Android app.
UX considerations for real-time transcription (feedback on recording, progress, and exports).
Basic document generation/export workflows on Android (creating PDFs and DOCX programmatically).

What's next for EchoAI

Add smaller / larger model options selectable by the user (trade accuracy for speed).
Improve multilingual support with language autodetection and language-specific vocab tuning.
Add an optional (privacy-first) on-device tiny LM for better context-aware corrections.
Support timestamps and speaker diarization for multi-speaker recordings.
Optimize model loading (lazy load, streaming inference) to further reduce memory spikes.

Built With

android
java
tflite

Updates

Hector Ta started this project — Dec 03, 2025 12:37 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.