Inspiration

Many people navigate the world with vision as their primary cue. That is not always possible, so we wanted something tangible. We explored whether camera perception, short-range sensing, and haptic feedback could form a lightweight spatial sense on the body, something that helps with obstacle awareness without constant visual attention. FusionBelt is our prototype: a belt that turns fused sensor data into directional vibration so the wearer can feel where something is, not only that something is nearby. We also looked at the problem as a real product. The same handsfree spatial guidance that helps blind and low vision users can help industrial workers in noisy, high risk environments, and it can help astronauts who already carry heavy equipment, have restricted mobility, and cannot easily use their hands for navigation.

What it does

FusionBelt runs on a Rubik Pi 3, QCS6490-class, and uses YOLO from Qualcomm AI Hub, exported to TFLite and run through LiteRT with the QNN HTP delegate so object detection runs on the NPU. In our testing, NPU inference was about 3x faster than CPU for the same detection workload, which increased frame rate and reduced end to end latency. That headroom frees CPU time for sensor fusion, serial I/O to the M5 belt, web streaming, and voice. The M5StickC Plus is a full edge client, not a passive sensor relay. Its onboard mic and physical button give the wearer a hardware push-to-talk channel that fires at ESP32 speed — deterministic capture with no wake word and no missed trigger. The MPU6886 IMU streams 6-axis motion data at 20 Hz, and fall detection is tuned against simultaneous ultrasonic readings so the system distinguishes a stumble near an obstacle from a normal fast movement. Four Modulino vibration motors are daisy-chained over I2C, so directional haptic feedback across the full belt fires from a single serial command with no per-motor wiring complexity. For interaction, the Pi transcribes speech with Whisper, decides an action, and speaks short replies with Piper. In parallel, the belt provides directional haptics for obstacle avoidance based on fused camera and ultrasonic signals. The result is a holistic, complete system: edge vision, edge voice, and wearable haptics with no tethered GPU server anywhere in the loop.

How we built it

Vision on NPU: We export a QAI Hub YOLO model to TFLite, run it through LiteRT, and load libQnnTFLiteDelegate.so so inference is HTP-accelerated on the Rubik Pi NPU. We measured roughly 3x faster inference on NPU than CPU, which improved responsiveness and freed budget for the rest of the system. Sensing and the ESP32 edge client: The M5StickC Plus is the nervous system of the belt. Dual ultrasonics cover front and rear. The MPU6886 IMU streams 6-axis data at 20 Hz — and fall detection is not just a threshold on raw acceleration, it is tuned against simultaneous ultrasonic readings so the system distinguishes a stumble near an obstacle from a normal fast movement. Four Modulino vibration motors are daisy-chained over I2C, meaning the entire haptic layer across the belt is driven from a single serial command with no per-motor wiring complexity. The onboard mic and physical button give the wearer a push-to-talk channel that fires at ESP32 speed — hardware-deterministic capture with no wake word, no latency, no missed trigger. Voice and agent loop: The Pi runs Whisper for speech-to-text and Piper for text-to-speech so the belt can be used completely handsfree. This matters for accessibility and equally for industrial and aerospace scenarios where hands are occupied and voice is the only clean input channel. Reasoning: A small Gemma model through Ollama reads structured scene text and returns JSON with dir, urgency, and speak. Deterministic fallbacks keep the system safe when the model is slow or unsure. Actuation and demo: FastAPI serves MJPEG and WebSockets so judges can watch NPU-backed detections, distance readings, voice turns, and belt state together in real time.

Challenges we ran into

NPU integration vs. "it runs": Getting YOLO onto HTP required correct delegate paths, export settings, and validation that outputs stayed consistent after quantization. CPU fallback is fine for debug, but the demo needed stable NPU performance. Custom models on NPU: We attempted to run custom models through Edge Impulse using the CLI on Linux targeting the NPU. That path was not successful during the weekend, which taught us that deployment toolchain and model format constraints matter as much as model accuracy. Audio, IMU, and safety sharing the same wire: Sharing a high-baud serial link between distance telemetry, 20 Hz IMU frames, motor commands, and binary mic chunks required a disciplined parser on the Pi and predictable timing on the M5 — so obstacle warnings never get blocked by a voice turn, and fall detection never gets dropped behind a haptic command.

Accomplishments we're proud of

Real edge acceleration end to end: YOLO is QAI Hub → TFLite → QNN delegate → NPU, and it was about 3x faster than CPU in our tests. The ESP32 as a full edge client: The M5 is not a passive sensor relay. It does hardware push-to-talk, ultrasonic-tuned fall detection, daisy-chained I2C haptics, and real-time IMU streaming — all deterministically, all at the edge, all over one serial wire. One wearable, one board, complete system: NPU vision, local LLM navigation, Whisper and Piper voice, tuned fall detection, daisy-chained haptics, and a live web surface — all running on one embedded platform, no cloud, no GPU server, no shortcuts. Clear target markets: A solution that maps to real constraints in accessibility, industrial safety, and aerospace operations.

What we learned

Split work by silicon and by latency budget: fixed-shape vision belongs on the NPU, orchestration and safety logic need predictable scheduling on the CPU, and the serial wire to the M5 should be treated as a hard real-time boundary.

What's next for FusionBelt

Product and customers: Validate the haptic language and ergonomics with blind and low vision users and with industrial workers. Refine vibration patterns, comfort, and training time. Aerospace potential: Edge acceleration reduces latency and reduces dependence on connectivity, which matters in space and in remote environments. FusionBelt could support object detection and hazard awareness inside spacecraft, during EVA, and on unknown terrain where visibility and dexterity are limited and handsfree guidance is valuable. More NPU residency: Continue pushing inference workloads onto accelerators where it makes sense, and revisit the custom model deployment path to run specialized detectors beyond generic classes. Metrics and benchmarking: Publish repeatable benchmarks for FPS, end-to-end latency, and power for CPU versus NPU, and measure how each subsystem affects the user experience.

Built With

  • bash
  • cpp
  • esp32
  • fast-api
  • gemma
  • ollama
  • python
  • rubik-pi-3
  • tensorflow
  • yolo
Share this project:

Updates