Inspiration

Visually impaired users often depend on cloud-based OCR apps that are slow, inconsistent, and invasive. During user testing with a visually impaired volunteer, one insight stood out:

“I don’t need the phone to read the text. I need it to tell me what to do with it.”

That sentence defined the entire direction of the project. Instead of building “yet another OCR reader,” the goal became to build a real-time, offline, action-oriented vision agent—something that understands context and acts instantly.

ArmVision Assist was created to answer one question:

What if your phone could look at the real world and help you act in under 200ms—with zero internet?

What it does

ArmVision Assist processes the live camera stream and performs three things simultaneously:

Understands text (OCR + on-device hazard classifier)

Infers context (menu, invoice, medicine, warning sign)

Generates actions (Call, Open Link, Pay, Email, Save Contact, Safety Alert)

It runs 100% offline on ARM processors and produces:

Real-time AR overlays

Safety haptics for critical warnings

Instant intent chips to act on extracted information

All in airplane mode.

How we built it

ArmVision Assist is the combination of four on-device systems working harmoniously:

  1. Vision Pipeline (CameraX + ML Kit)

Implemented a zero-copy ImageAnalysis pipeline on YUV_420_888 buffers.

Integrated ML Kit Text Recognition v2 (on-device).

Built a custom lighting histogram to detect low light and trigger the torch automatically.

  1. ARM-Optimized Hazard Classifier (TFLite INT8)

Trained a micro model on synthetic hazard keywords + symbol patterns.

Converted to TFLite INT8 for ARM efficiency.

Calibrated for Neon-friendly tensor operations.

Achieved 14–20ms inference on mid-range ARM cores.

  1. Context Brain

Built a hybrid model combining

keyword density scoring,

regex templates,

domain rules (finance, medical, safety).

Designed a debounce system to prevent repeated speech/haptics.

  1. AR HUD + Action Engine

Designed custom coordinate-mapping between camera and viewport.

Created dynamic action chips (Phone, Email, UPI, URL).

Implemented audio-throttle + haptic priority loop for smooth UX.

Everything is contained in a single, native Kotlin build with no network dependencies.

Challenges we ran into

This project forced deep exploration into mobile ARM performance engineering—far more than a standard Android project.

Key Learnings

How to optimize ML inference across big.LITTLE cores.

How quantization affects performance, stability, and battery drain.

Using Perfetto and Systrace to detect frame spikes and slow paths.

Designing AR overlays that remain stable during jitter and hand shake.

Balancing user experience with strict performance budgets (<200ms end-to-end).

By the end, the biggest insight was realizing that true accessibility does not come from adding more features—but from removing latency.

Accomplishments that we're proud of

  1. Stabilizing OCR latency

Initial frame drops were caused by buffer conversions; solved via zero-copy implementation.

  1. Thermal throttling on extended use

Perfetto traces revealed that OCR was hitting big cores too often. We moved non-critical logic to LITTLE cores.

  1. Hazard model accuracy

The classifier initially missed low-contrast text. We improved synthetic training to include noisy, blurred, and warped samples.

  1. AR overlay alignment

Different devices have different camera sensor aspect ratios. A custom normalized mapping layer fixed alignment across tested ARM devices.

  1. Speech spam

Continuous OCR caused repeated TTS output. Solution: a speech-throttle algorithm with cooldown + context grouping.

What we learned

ARM NN / GPU delegate for even faster INT8 inference

Multilingual support (Hindi, Tamil, Bengali, Arabic)

Personalization via on-device federated learning

Model support for symbol-based hazard detection

SDK for NGOs and low-cost accessibility devices

What's next for ArmVision Assist — Offline Action Agent for ARM Mobile

ARM NN / GPU delegate for even faster INT8 inference

Multilingual support (Hindi, Tamil, Bengali, Arabic)

Personalization via on-device federated learning

Model support for symbol-based hazard detection

SDK for NGOs and low-cost accessibility devices

Built With

  • andriodstudio
  • aroverlay
  • camerax
  • googlemlkittextrecognitionv2
  • hapticfeedbackengine
  • kotlin
  • perfetto+systraceprofiling
  • tensorflowlite(int8quantization)
Share this project:

Updates