🌟 Inspiration
AbleAI was inspired by a single question:
How much AI can run fully offline on an ARM-powered device without relying on the cloud?
Most AI tools today depend on internet access or cloud APIs, making them slow, unreliable, or unusable for people in low-connectivity environments. Many visually impaired users and students also need tools that work instantly and privately.
AbleAI is my attempt to prove that powerful, multimodal AI can run entirely offline on everyday ARM-based devices like phones, tablets, Raspberry Pi, and M-series Macs.
🧠 What It Does
AbleAI consists of three fully offline AI modules, each demonstrating a core capability of edge AI:
📖 Module 1 - OCR Reader
Reads printed text aloud.
- Captures images using the camera
- Preprocesses images for clarity
- Performs OCR using Tesseract locally
- Speaks text using offline TTS
Key files:
app.py- main controller + camera flowocr_reader.py- preprocessing + OCRspeaker.py- offline TTS
🎯 Module 2 - On-Device Object Detection
Detects objects in real time using YOLOv8 models running directly on ARM CPU.
- Works fully offline
- Uses lightweight models (
yolov8n.pt,yolov8m.pt) - Detects objects and speaks the results
This demonstrates real edge computer vision.
🎙 Module 3 - Voice Assistant
A simple offline assistant that listens and performs actions.
- Offline speech-to-text
- Can run OCR or object detection when asked
- Responds using offline TTS
- Combines speech + vision + logic
🛠 How I Built It
Computer Vision
- OpenCV
- Image preprocessing (blur, resize, thresholding)
- YOLOv8 inference on CPU
OCR
- Tesseract OCR via
pytesseract - Adaptive thresholding for noisy cases
Speech
pyttsx3offline TTS- macOS “say” fallback
- SpeechRecognition for ASR
Optimized for ARM
- All code runs on CPU
- No cloud or GPU required
- Lightweight, efficient AI logic
- Tested on ARM hardware
🧩 Built With
- Python
- OpenCV
- Tesseract OCR
- YOLOv8
- pyttsx3
- SpeechRecognition
- NumPy
- Pillow
🚧 Challenges I Faced
- Running YOLOv8 efficiently on CPU
- Making OCR accurate on low-light / skewed images
- Integrating TTS + STT + CV without lag
- Ensuring everything stays 100% offline
- Designing three modules that work independently and together
📚 What I Learned
- How to optimize AI for edge devices
- How preprocessing changes OCR performance
- How to design multimodal assistants
- ARM-specific performance considerations
- The challenges of real-time audio + vision
🔗 Try It Out
GitHub Repository:
https://github.com/DiyaMenon/AbleAI
Log in or sign up for Devpost to join the conversation.