Kevin has found himself in a quite unfortunate situation: he is addicted to K-pop, to the point where he has memorized every single song created by the popular K-pop group TWICE. As a result, he has begun experiencing severe withdrawal symptoms. Since we're his best friends, we decided to fuel his addiction with computer-generated K-pop songs.
What it does
With a trained Pytorch model, we generate unique sequences of audio by providing the model with a partially random seed. The model will output an audio waveform. This waveform is saved to a .WAV file and served to a web app, where our K-pop is then played to a user.
How We built it
We trained a Recurrent Neural Network (RNN) that uses Long Short-term Memory (LSTM) cells to recreate the Discrete Fourier Transform (Mapping Domain Time to Domain Frequency) of a waveform of a 16 kHz bitrate monaural .WAV file. During usage, the pre-trained model outputs a unique DFT that is then inverted into audio and served to a React frontend by a Flask app.
Challenges I ran into
When we began, we were faced with the task of creating a dataset from scratch. Fortunately for us, Kevin is the proud owner of every TWICE song ever made. Our biggest challenge was learning about the RNN and writing various algorithms for batch computing data. We found that we had made a lot of very simple mistakes that messed up our training, so our training time was very limited. We also had issues with the layout of the frontend and communicating files to the Flask server. Thankfully, all these problems were eventually solved. We used media breakpoints for the frontend and rewrote the algorithm and played with the weights of the RNN.
Accomplishments That We’re Proud Of
- Kevin got to listen to K-pop for 36 hours.
- We got to listen to K-pop for 36 hours.
- We wrote and trained an LSTM RNN from scratch in 36 hours with little experience.
What We learned
We learned a lot about machine learning, in particular RNNs. We also learned a lot about signal processing via DCTs and DFTs. We ended up using STFTs (Short Term Fourier Transforms) as well. Additionally, we learned about creating polished web applications with proper animations, design, and features.
What's next for Twice4Life
Adding features such as input from user analytics can be used as a way to enrich the user experience. Looking into training for longer/trying different approaches will help us to improve the model and create a better end product.