Splash Screen
SNN based SLM detecting the Activity Gesture
Showing Power Rails using ODPM
Suggesting the Apps with the Real time 3 days based prediction
Backend AwillVoiceAssistant testing

Inspiration & Vision

We started from a simple, slightly annoying question:

Why does a phone that already has powerful Arm CPUs still need the cloud to feel “smart”?

In 2025, the AI spotlight is dominated by:

Nvidia GPUs and CUDA
Proprietary NPUs from Apple and Qualcomm
Fast-rising RISC-V accelerators

Meanwhile, Arm powers billions of devices, yet most of that silicon is treated as a host for someone else’s AI workloads rather than a primary AI engine.

We saw a clear gap:

Mobile AI is fragmented across vendor-specific NPUs.
Cloud assistants are slow, dependent on connectivity, and not truly private.
Arm risks being overshadowed unless it can host brain-like, always-on intelligence directly on its CPUs.

So we imagined something different:

“A biological nervous system inside every Arm device.”
A phone that learns like a brain, stays private like a diary, and reacts in real time like a reflex.

This vision became AwillOS / Cortex-N — an agentic, neuromorphic, contextual AI layer living on top of Android, running entirely on Arm Cortex-A CPUs.

No cloud.
No proprietary NPU lock-in.
Just smart software transforming commodity Arm CPUs into a neuromorphic AI fabric.

About the Project

We set out to prove a phone could run a full “cognitive loop” entirely offline:

wake word → ASR → local LLM → TTS → sensor-aware actions

—without any cloud help.

Inspiration came from:

privacy-first assistants,
wearables that must survive bad connectivity, and
a desire to blend classical sensing (IMU/vision) with modern generative models in one stack.

What We Built

We built an Android app (Jetpack Compose + classic views) that orchestrates multiple on-device AI agents:

Wake word: OpenWakeWord (ONNX)
Speech-to-text (ASR): Whisper-small INT8 (ONNX)
Local LLM: Llama 3.2 via ExecuTorch .pte
TTS: Piper (ONNX)
Embeddings: MiniLM INT8 (ONNX)
Gesture/activity SNN: custom C++/ONNX spiking model
AudioGen: TensorFlow Lite for creative sound generation

Feature modules include:

Vision agent (YOLO via ONNX)
ASR agent
Predictor/context agent (Room + DataStore + WorkManager + Play Services Location)
Telemetry overlay
aura-runtime for centralized task routing

Native layers are implemented via NDK/CMake (C++17, NEON/SME2 paths) for:

SNN kernels
AudioGen JNI bridge

We use:

A unified ONNX Runtime 1.17.1 across modules
ExecuTorch AAR for on-device LLM inference

How We Built It

Android stack:
- Gradle Kotlin DSL + AGP 8.13.1
- Kotlin 1.9.22
- Jetpack Compose BOM for UI
- Room / WorkManager / DataStore for state
- Timber + Perfetto for telemetry
Model packaging & export:
- All models bundled under assets/ to avoid network fetches
- Export scripts in Python using:
- PyTorch + Transformers
- ONNX Runtime quantization for Whisper
- ExecuTorch exporter for Llama
Performance budgeting:

We budgeted latency per stage to keep real-time loops responsive. For camera/gesture loops, we aimed for:

[ \sum_i t_i \leq 33\,\text{ms} ]

to maintain roughly 30 FPS.
This drove us toward INT8 quantization and use of ARMv8.2 dotprod for YOLO and Whisper.

Native & build config:
- CMake projects for SNN and AudioGen
- Tuned flags: -march=armv8.2-a+fp16+dotprod, optional SME2
- JNI glue for SNN and AudioGen
- TensorFlow Lite JNI integrated into native builds for AudioGen

What We Learned

A single, unified runtime version (ONNX Runtime 1.17.1) dramatically reduces JNI/provider conflicts; dependency drift was a hidden cost.
ExecuTorch is viable for mid-size LLMs on mobile when:
- weights are pre-quantized, and
- context windows are small.
  Memory layout mattered more than raw FLOPs.
Quantization and operator availability drive design:
- Some transforms required patching export graphs
- We had to stay within supported opsets (≤ 17).
Sensor fusion for context (IMU + location + foreground app) benefits from small SNNs:
- Tiny spiking models can add meaningful intent signals without heavy compute.

Challenges We Faced

Build bloat & disk pressure:
- 3.1 GB APK with 2.7 GB of models.
- Required disabling redundant copy tasks and aggressively stripping native libs.
API-level landmines:
- APIs like thermal status and SOC_MODEL vary across devices.
- Needed guarded code paths to keep minSdk 26 devices working reliably.
JNI/provider conflicts:
- Multiple modules initially pulled different ORT/TFLite versions.
- Solved by centralizing versions in gradle.properties and sharing runtime sessions via DI.
Native build flakiness:
- NEON/SME2 flags and mixed ABIs caused subtle issues.
- Resolved with consistent NDK r27b and CMake 3.22.1 configs.
AudioGen integration:
- Full TFLite path clashed with symbol availability.
- We shipped a simplified synthesis path while keeping models bundled for future enablement.

Why It Matters

Demonstrates a privacy-preserving assistant that runs the full speech/vision/context loop offline, ideal for edge devices and connectivity-challenged environments.
Provides a template for mixing heterogeneous ML runtimes:
- ExecuTorch
- ONNX Runtime
- TensorFlow Lite

under one Android app with modular agents and native accelerations—a practical recipe for next-generation on-device AI on Arm.