Inspiration

The idea for QuantumEdge was born from a simple question: What if we could make AI models on tiny devices like the Raspberry Pi not just smaller, but fundamentally more efficient by borrowing ideas from quantum physics?

I was inspired by recent research in quantum machine learning and tensor network theory, which showed that complex systems, like quantum states or neural networks, can be represented with far fewer parameters than their raw dimensionality suggests. I wondered: Could we apply these principles to edge AI, where every byte of memory and millisecond of latency matters?

The Arm AI Developer Challenge provided the perfect catalyst. Its focus on on-device AI using Arm architecture aligned perfectly with my goal: to build something that wasn’t just novel, but practically useful for real-world edge applications such as farms, classrooms, factories, where cloud connectivity is unreliable or nonexistent.


What I Learned

This project pushed me far beyond standard deep learning. I learned:

  • How to bridge quantum physics and classical AI: Translating Matrix Product States (MPS) into practical, compressible layers for CNNs required understanding both quantum entanglement and tensor decomposition.

  • Arm-specific optimization: I dove deep into NEON SIMD instructions and ONNX Runtime for Arm64, learning how to profile and optimize inference speed on Cortex-A72/A76 cores.

  • The art of compression without collapse: Balancing model size, speed, and accuracy is an art. I learned that aggressive pruning kills performance, while conservative compression misses the point. Finding the sweet spot required iterative bench-marking and creative layer design.

  • Open-source rigor: Writing code that others can build upon taught me the importance of clear documentation, reproducible environments, and permissive licensing, which are all critical for community impact.


How I Built It

Step 1: Concept & Design

I started by defining the core innovation: replacing dense layers in MobileNet with Matrix Product State (MPS) representations. The math behind this is elegant:

Let $W \in \mathbb{R}^{D_{in} \times D_{out}}$ be a weight matrix. We reshape it into a high-dimensional tensor $T \in \mathbb{R}^{2^n \times 2^n}$ and decompose it via iterative SVD into a chain of 3-tensors: $$ T_{i_1 i_2 \dots i_{2n}} = \sum_{\alpha_1, \dots, \alpha_{2n-1}} A^{[1]}{i_1 \alpha_1} A^{[2]}{\alpha_1 i_2 \alpha_2} \cdots A^{[2n]}{\alpha{2n-1} i_{2n}} $$ Each $A^{[k]}$ is a small core tensor, reducing the parameter count from $O(D^2)$ to $O(\chi D)$, where $\chi$ is the bond dimension.

Step 2: Implementation

  • Wrote src/quantum_mps.py to perform the decomposition and inference.
  • Used PyTorch to train a baseline MobileNet, then exported it to ONNX.
  • Integrated MPS compression as a post-processing step, converting dense layers into tensor chains.
  • Optimized the contraction using NumPy’s einsum, which compiles efficiently on Arm Neon.

Step 3: Deployment on Raspberry Pi

  • Created setup_arm.sh to install Arm-optimized ONNX Runtime and dependencies.
  • Tested on Raspberry Pi 5 (8GB RAM) running Raspberry Pi OS 64-bit.
  • Added two demo modes: camera (real-time object detection) and sensor (anomaly detection).

Step 4: Benchmarking

Compared against standard MobileNet-v1: | Metric | Standard Model | QuantumEdge-MPS | |-----------------|----------------|------------------| | Size | 4.3 MB | 1.8 MB | | Latency (Pi 5) | 210 ms | 92 ms | | Accuracy | 70.2% | 68.9% |

Result: 58% smaller, 2.3x faster, <2% accuracy drop.


Challenges Faced

  1. Memory Constraints on Pi: Initial implementations crashed due to OOM errors. Solved by:

    • Using numpy.memmap for large tensors.
    • Reducing bond dimension $\chi$ from 32 to 16.
    • Adding garbage collection hooks.
  2. Arm Optimization: Getting NEON acceleration working required:

    • Compiling ONNX Runtime from source with -mcpu=cortex-a72.
    • Avoiding Python loops in critical paths; vectorizing with einsum.
  3. Quantum Math is NOT Easy Code: Translating theoretical MPS into runnable code was tricky. Debugged by:

    • Comparing intermediate tensor shapes with paper pseudocode.
    • Validating with small test cases (e.g., 2x2 matrices).
  4. Video Demo Under 3 Minutes: Had to ruthlessly edit to fit time limit. Focused on:

    • Live Pi camera feed + latency stats.
    • Side-by-side benchmark table.
    • Clear voiceover explaining the “why” and “how”.

This project taught me that the most powerful innovations often lie at the intersection of seemingly unrelated fields, which in this case, quantum physics and edge computing. By applying quantum-inspired methods to Arm-based devices, I’ve created a framework that’s not just a "hackathon" entry, but a foundation for future lightweight, on-device AI.


Built With

  • arm-compute-library(neon-simd)
  • arm-cortex-a72/a76
  • c++
  • cmake
  • git
  • linux-gpio
  • onnx-runtime-api
  • opencv
  • python-3.11
  • raspberry-pi-os-64-bits
Share this project:

Updates