Project Description

First ever sound application to run with an audio-network on Hailo-8 The application captures sound from a microphone or from a pre-recorded sound-file, and infers which word was spoken

Usefulness

Exists in real world use - Siri, Alexa, Cortana. Using Hailo-8 and its ability to infer locally and run high FPS streams, there are various uses and industries to use this ability. Moreover, using multi-stream, using Hailo-8 we can enhance our existing use cases - smart camera (with sound), automotive (with sound) etc,

User Adoption

Users can use hailo-8 chip to infer locally - which can help solve the privacy issues with existing smart speakers. This feature can be use once hailo is widely available to control the chip, home automation etc.

Industry and Market

Siri/Alexa, with less power consumption, more privacy – we can infer locally Automotive - all in one chip, sound and video, voice commands to control car systems Smart Camera, wakes only on movement or sound over certain threshold. Home security and Automation Military/surveillance

The Hailo Difference

using I2S "sleep" mode have minimalistic power consumption High throughput, small form-factor - ideal for smart camera/recorders.

How We Built It

The project was divided to several work flows:

  • Training - where we trained the network, and also made a data collection process, so we will able to detect APPs names.
  • Audio stream - reading the audio from a microphone and stream it to Hailo-8.
  • Visualization - visualizing of the the speech segment, showing speech wave, spectrogram and detected keyword.
  • Remote control - controlling on the demos at a remote computer (i.e. kitchen laptop).

Challenges We Ran Into

  • Using the TAPAS infrastructure which was built and designed for video streams application to work with audio streaming and non-streaming files.
  • New network Training - an example network was not good enough for Hailo-8, because of missing kernels like (20,40) convs, Which made us redesign the network.
  • Preprocess - Audio has a known features called MFCC. The TF implementation was not good enough because it was not reproduceable in a non-TF environment. Therefore, we replaced TF implementation with different MFCC python implementation, retrained the network and enabled this preprocessing in the Apps env.
Share this project:

Updates