Inspiration
Visually impaired users often depend on cloud-based OCR apps that are slow, inconsistent, and invasive. During user testing with a visually impaired volunteer, one insight stood out:
“I don’t need the phone to read the text. I need it to tell me what to do with it.”
That sentence defined the entire direction of the project. Instead of building “yet another OCR reader,” the goal became to build a real-time, offline, action-oriented vision agent—something that understands context and acts instantly.
ArmVision Assist was created to answer one question:
What if your phone could look at the real world and help you act in under 200ms—with zero internet?
What it does
ArmVision Assist processes the live camera stream and performs three things simultaneously:
Understands text (OCR + on-device hazard classifier)
Infers context (menu, invoice, medicine, warning sign)
Generates actions (Call, Open Link, Pay, Email, Save Contact, Safety Alert)
It runs 100% offline on ARM processors and produces:
Real-time AR overlays
Safety haptics for critical warnings
Instant intent chips to act on extracted information
All in airplane mode.
How we built it
ArmVision Assist is the combination of four on-device systems working harmoniously:
- Vision Pipeline (CameraX + ML Kit)
Implemented a zero-copy ImageAnalysis pipeline on YUV_420_888 buffers.
Integrated ML Kit Text Recognition v2 (on-device).
Built a custom lighting histogram to detect low light and trigger the torch automatically.
- ARM-Optimized Hazard Classifier (TFLite INT8)
Trained a micro model on synthetic hazard keywords + symbol patterns.
Converted to TFLite INT8 for ARM efficiency.
Calibrated for Neon-friendly tensor operations.
Achieved 14–20ms inference on mid-range ARM cores.
- Context Brain
Built a hybrid model combining
keyword density scoring,
regex templates,
domain rules (finance, medical, safety).
Designed a debounce system to prevent repeated speech/haptics.
- AR HUD + Action Engine
Designed custom coordinate-mapping between camera and viewport.
Created dynamic action chips (Phone, Email, UPI, URL).
Implemented audio-throttle + haptic priority loop for smooth UX.
Everything is contained in a single, native Kotlin build with no network dependencies.
Challenges we ran into
This project forced deep exploration into mobile ARM performance engineering—far more than a standard Android project.
Key Learnings
How to optimize ML inference across big.LITTLE cores.
How quantization affects performance, stability, and battery drain.
Using Perfetto and Systrace to detect frame spikes and slow paths.
Designing AR overlays that remain stable during jitter and hand shake.
Balancing user experience with strict performance budgets (<200ms end-to-end).
By the end, the biggest insight was realizing that true accessibility does not come from adding more features—but from removing latency.
Accomplishments that we're proud of
- Stabilizing OCR latency
Initial frame drops were caused by buffer conversions; solved via zero-copy implementation.
- Thermal throttling on extended use
Perfetto traces revealed that OCR was hitting big cores too often. We moved non-critical logic to LITTLE cores.
- Hazard model accuracy
The classifier initially missed low-contrast text. We improved synthetic training to include noisy, blurred, and warped samples.
- AR overlay alignment
Different devices have different camera sensor aspect ratios. A custom normalized mapping layer fixed alignment across tested ARM devices.
- Speech spam
Continuous OCR caused repeated TTS output. Solution: a speech-throttle algorithm with cooldown + context grouping.
What we learned
ARM NN / GPU delegate for even faster INT8 inference
Multilingual support (Hindi, Tamil, Bengali, Arabic)
Personalization via on-device federated learning
Model support for symbol-based hazard detection
SDK for NGOs and low-cost accessibility devices
What's next for ArmVision Assist — Offline Action Agent for ARM Mobile
ARM NN / GPU delegate for even faster INT8 inference
Multilingual support (Hindi, Tamil, Bengali, Arabic)
Personalization via on-device federated learning
Model support for symbol-based hazard detection
SDK for NGOs and low-cost accessibility devices
Built With
- andriodstudio
- aroverlay
- camerax
- googlemlkittextrecognitionv2
- hapticfeedbackengine
- kotlin
- perfetto+systraceprofiling
- tensorflowlite(int8quantization)


Log in or sign up for Devpost to join the conversation.