Inspiration

Automatic Speech Recognition is very important in daily life. It is a multidisciplinary domain, as it uses tools of computer-vision to generate sequences of characters as in NLP.

A speech wave file is not only what we listen to.

What it does

The Model aims to recognize speech from wave files. It uses a Deepspeech network and converts sound to images to sequence of characters.

How we built it

I used many codes using Deepspeech, some could not be compiled, due to code breaking of the habana framework, others had more advanced features. namely three tentatives :

  1. https://github.com/SeanNaren/deepspeech.pytorch.git : based on hydra -pytorchlightning
  2. https://github.com/mozilla/DeepSpeech.git : uses older version of tensorflow, and did not compile well on HB platform
  3. https://github.com/jiwidi/DeepSpeech-pytorch.git : modified, running on local machine without torchaudio dependencies, but some errors occur in EC2-DL1 machine (inside hb docker )

My adapted Code : https://github.com/mbencherif/DeepSpeechHB.git

Challenges we ran into

  • Many code breakings, most of them solved.
  • Documentation not available to go straight forward to code porting
  • Cost of the machine, 13$/hour..
  • Working on two systems, cuda platform on my computer, (RTX3090+ Threadripper ) and working on EC2-DL1, totally another world of magic.

Accomplishments that we're proud of

  • Modified the torchaudio to work on EC2-DL1- machine with actual framework, which is not possible with actual HB framework.

What we learned

A ton of things, but it is good to learn management of stress and time. :)))

What's next for ASR on Gaudi Processors

Built With

Share this project:

Updates