Inspiration
I'm incredibly fascinated by whistled languages, and want to learn one, so I figured I'd make a deep learning model capable of recognizing patterns in a whistled language and translating the audio to text.
What it does
Accepting a wav format of an example of the whistled language used (silbo gomero), it turns it into a spectrogram, runs a CNN over the spectrogram, and outputs tokens, which are then converted back into letters and strung together to make words.
How we built it
Slowly, painstakingly, and with the help of many, many youtube tutorials (and ChatGPT, which did explaining but not coding).
Challenges we ran into
- Converting file formats
- Padding
- Tensor indexing & reshaping (what even is a tensor anyway)
- Not remembering a ton about AI and having to relearn it all
- General shenanigans and hijinks that I don't recall because I've been working on this for 13 hours and it feels like a fever dream
- Getting a loss value of 8 billion, then 15 (billion), then nan.
Accomplishments that we're proud of
- I now know what a tensor is
- I now know so much about AI architecture (activations, layers, optimizers, loss functions, etc)
- An incredibly significant amount of progress towards this project I've been wanting to do for a while now. I have plans to expand it, and this provides a very solid base.
What we learned
- See above comments
What's next for Whistle Audio Recognition
- TTS
- AI-powered feedback on quality of whistle
- Whistled conversations with AI
Built With
- chatgpt
- google-colab
- python
- tensorflow
Log in or sign up for Devpost to join the conversation.