Blood and Concrete: A Love Story.

You mother BEEP er, come on you little BEEPBEEP with me, eh? You BEEP little BEEP, BEEP BEEP BEEP

P.S. Also dedicated to those who like voice messages a lot.

What it does

It is a tool for automatic audio bleep censoring. As a proof of concept, we've put it inside a Telegram bot @audiocensor_bot to play with.

How we built it

UI: Telegram bot with python-telegram-bot, hosted on Heroku with a webhook.

NN: Deep Recurrent Convolutional NN on Keras with TensorFlow.

Data: Friends (kindly sweared on us by request); hand-picked pieces of audio from open sources.

Challenges we ran into

  • Existing speech processing tools (like Google Cloud) are mainly focused on Speech-to-Text task, which may help us but still does more work than needed. So, to trigger on a very limited set of words and stems, we decided to move to our own RNN.
  • We've made our small word recordings dataset representative and homogeneous enough (in terms of voice qualities, background noises, loudness speed and so on).

Accomplishments that we're proud of

  • We've managed to collect our own bootstrapped dataset, generated from ~50 obscene and 'normal' word samples.
  • First time dared to participate in data science track on a hackathon.
  • Solved the first audio processing task in our datascience lives :)

What we learned

  • As no team members had ever worked with audio data, we had to learn a lot from the very basics of speech preprocessing and augmentation.
  • Managing webhook of a project with complex dependencies by Heroku.

What's next for AudioCensor

  • Streaming usage: online telephony and other streaming services (Twitch / etc).
  • Better dataset and more experiments on model and augmentation.

Built With

Share this project: