🌟 Inspiration

AbleAI was inspired by a single question:

How much AI can run fully offline on an ARM-powered device without relying on the cloud?

Most AI tools today depend on internet access or cloud APIs, making them slow, unreliable, or unusable for people in low-connectivity environments. Many visually impaired users and students also need tools that work instantly and privately.

AbleAI is my attempt to prove that powerful, multimodal AI can run entirely offline on everyday ARM-based devices like phones, tablets, Raspberry Pi, and M-series Macs.

🧠 What It Does

AbleAI consists of three fully offline AI modules, each demonstrating a core capability of edge AI:

📖 Module 1 - OCR Reader

Reads printed text aloud.

Captures images using the camera
Preprocesses images for clarity
Performs OCR using Tesseract locally
Speaks text using offline TTS

Key files:

app.py - main controller + camera flow
ocr_reader.py - preprocessing + OCR
speaker.py - offline TTS

🎯 Module 2 - On-Device Object Detection

Detects objects in real time using YOLOv8 models running directly on ARM CPU.

Works fully offline
Uses lightweight models (yolov8n.pt, yolov8m.pt)
Detects objects and speaks the results

This demonstrates real edge computer vision.

🎙 Module 3 - Voice Assistant

A simple offline assistant that listens and performs actions.

Offline speech-to-text
Can run OCR or object detection when asked
Responds using offline TTS
Combines speech + vision + logic

🛠 How I Built It

Computer Vision

OpenCV
Image preprocessing (blur, resize, thresholding)
YOLOv8 inference on CPU

OCR

Tesseract OCR via pytesseract
Adaptive thresholding for noisy cases

Speech

pyttsx3 offline TTS
macOS “say” fallback
SpeechRecognition for ASR

Optimized for ARM

All code runs on CPU
No cloud or GPU required
Lightweight, efficient AI logic
Tested on ARM hardware

🧩 Built With

Python
OpenCV
Tesseract OCR
YOLOv8
pyttsx3
SpeechRecognition
NumPy
Pillow

🚧 Challenges I Faced

Running YOLOv8 efficiently on CPU
Making OCR accurate on low-light / skewed images
Integrating TTS + STT + CV without lag
Ensuring everything stays 100% offline
Designing three modules that work independently and together

📚 What I Learned

How to optimize AI for edge devices
How preprocessing changes OCR performance
How to design multimodal assistants
ARM-specific performance considerations
The challenges of real-time audio + vision

🔗 Try It Out

GitHub Repository:
https://github.com/DiyaMenon/AbleAI

Built With

numpy
ocr
opencv
pillow
python
pyttsx3
speechrecognition
tesseract
yolov8

Updates

Diya Satish Kumar started this project — Dec 04, 2025 01:30 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.