Whistled Audio Recognition

Inspiration

I'm incredibly fascinated by whistled languages, and want to learn one, so I figured I'd make a deep learning model capable of recognizing patterns in a whistled language and translating the audio to text.

What it does

Accepting a wav format of an example of the whistled language used (silbo gomero), it turns it into a spectrogram, runs a CNN over the spectrogram, and outputs tokens, which are then converted back into letters and strung together to make words.

How we built it

Slowly, painstakingly, and with the help of many, many youtube tutorials (and ChatGPT, which did explaining but not coding).

Challenges we ran into

Converting file formats
Padding
Tensor indexing & reshaping (what even is a tensor anyway)
Not remembering a ton about AI and having to relearn it all
General shenanigans and hijinks that I don't recall because I've been working on this for 13 hours and it feels like a fever dream
Getting a loss value of 8 billion, then 15 (billion), then nan.

Accomplishments that we're proud of

I now know what a tensor is
I now know so much about AI architecture (activations, layers, optimizers, loss functions, etc)
An incredibly significant amount of progress towards this project I've been wanting to do for a while now. I have plans to expand it, and this provides a very solid base.

What we learned

See above comments

What's next for Whistle Audio Recognition

TTS
AI-powered feedback on quality of whistle
Whistled conversations with AI

Built With

chatgpt
google-colab
python
tensorflow

Updates

Joaquin T started this project — Oct 08, 2023 01:47 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.