Inspiration
Most vision accessibility tools focus on reading text aloud. During early user testing with a visually impaired volunteer, a simple but important insight emerged:
“I don’t need the phone to read the text. I need it to tell me what to do with it.”
That changed the direction of the project.
Instead of building another OCR reader, the goal became to build an AI vision agent that understands context and triggers actions instantly.
Another major issue with existing tools is that they rely heavily on cloud processing, which introduces latency, privacy concerns, and complete failure when internet connectivity is poor.
ArmVision Assist was created to answer a simple question:
What if a phone could see real-world text and immediately help users act on it — entirely offline?
What it does
ArmVision Assist is an offline AI vision agent that observes text through a smartphone camera and converts it into immediate, actionable commands.
The system processes live camera frames and performs three tasks simultaneously:
Text Understanding On-device OCR extracts text from the camera feed in real time.
Context Inference A lightweight inference engine determines the meaning of detected text such as:
phone numbers
URLs
email addresses
payment links
medicine labels
warning signs
- Action Generation Based on the detected context, the agent suggests actions like:
Call detected phone numbers
Open websites
Compose emails
Trigger UPI payments
Save contacts
Issue safety alerts for hazardous warnings
The system runs fully offline on ARM devices, providing fast responses, improved privacy, and reliability even in airplane mode.
How we built it
ArmVision Assist combines several on-device systems designed for efficient mobile performance.
Vision Pipeline
CameraX live frame processing
Zero-copy ImageAnalysis pipeline using YUV_420_888
Google ML Kit Text Recognition v2 for on-device OCR
Lighting histogram to detect low-light conditions and trigger the flashlight automatically
ARM-Optimized Hazard Classifier
Custom lightweight model trained on hazard keywords and safety patterns
Converted to TensorFlow Lite INT8 quantized format
Optimized for ARM NEON instructions
Inference latency of approximately 14–20 ms on mid-range ARM devices
Context Intelligence Engine
A hybrid reasoning system combining:
keyword density scoring
regex pattern detection
domain-specific rules (finance, safety, medical)
This allows the system to interpret extracted text and determine possible actions.
AR Action Interface
Real-time AR overlays mapped to camera coordinates
Interactive action chips (Call, Email, Open Link, Pay)
Haptic alerts for safety warnings
Speech throttling to prevent repetitive announcements
The entire application was built as a native Kotlin Android app with no cloud dependencies.
Challenges we ran into
Developing a real-time vision system on mobile hardware presented several challenges.
Frame latency and buffer conversion Initial OCR pipelines caused frame drops due to unnecessary image buffer conversions. Implementing a zero-copy pipeline significantly improved processing speed.
Thermal throttling Continuous OCR workloads pushed tasks to the big CPU cores, causing thermal throttling. Profiling with Perfetto helped move non-critical tasks to LITTLE cores.
Low-contrast hazard detection Early versions of the hazard classifier struggled with faded labels. Synthetic training data was expanded with blurred and warped text samples to improve robustness.
AR overlay alignment Different devices have different camera sensor aspect ratios. A normalized coordinate mapping layer was implemented to ensure stable overlay positioning across devices.
Speech spam from continuous OCR Constant frame analysis triggered repeated TTS output. A speech throttling and context grouping mechanism was added to improve user experience.
Accomplishments that we're proud of
Achieved real-time on-device OCR with sub-200ms latency
Built a fully functional offline AI agent with zero cloud dependency
Implemented ARM-optimized INT8 ML inference
Developed a context-aware action engine that turns text into real-world actions
Created AR overlays and haptic feedback for accessibility
Successfully deployed an installable Android APK for public testing
The project demonstrates that powerful AI assistance can run entirely on-device, without relying on cloud infrastructure.
What we learned
Building ArmVision Assist required exploring multiple aspects of mobile AI engineering.
Key learnings included:
Performance optimization for ARM big.LITTLE CPU architectures
Practical effects of INT8 quantization on latency and battery usage
Using profiling tools like Perfetto and Systrace to diagnose performance bottlenecks
Designing AR overlays that remain stable during camera jitter
Balancing AI inference workloads with mobile thermal limits
The biggest takeaway was that accessibility tools benefit more from speed and reliability than from feature complexity.
What's next for ArmVision Assist — Offline Vision Action Agent
Several improvements are planned to expand the system's capabilities.
Multilingual support Add OCR and context understanding for languages such as Hindi, Tamil, Bengali, and Arabic.
GPU / ARM NN acceleration Integrate GPU delegates or ARM NN for faster inference on supported devices.
Symbol-based hazard detection Extend the hazard classifier to detect visual warning symbols, not just text.
Personalized learning Introduce on-device learning to adapt to individual user preferences.
Accessibility SDK Create a lightweight SDK so NGOs and accessibility device manufacturers can integrate the technology into low-cost assistive devices.
Try it out yourself! (Google drive link with apk provided)
Built With
- action
- ai
- android
- apis
- apk
- ar
- arm
- between
- camera
- camerax
- context
- contextual
- coordinate
- custom
- deployment
- distribution
- engine
- feedback
- for
- frames
- frameworks
- haptic
- inference
- inference)
- int8
- intelligence
- interface
- kit
- kotlin
- learning
- lite
- machine
- mapping
- ml
- mobile
- model
- native
- neon
- optimization
- overlay
- parsing
- perfetto
- performance
- profiling
- quantization
- quantized
- real-time
- recognition
- regex-based
- rendering
- rule-based
- studio
- suggestions
- system
- systrace
- tensorflow
- text
- ui
- v2
- via
- visualization


Log in or sign up for Devpost to join the conversation.