Inspiration
Because of the epidemic of covid19, many offline meeting scenarios have been switched to online. However, we have noticed that the audio quality is very poor for various reasons, which seriously affects the efficiency of communication.
What it does
Audio compression and ambient noise degrade the quality of speech, and these distortions usually occur in the high frequency detail components. Our system therefore uses GAN to generate and compensate for impaired audio components, thereby improving the quality of speech communication.
How we built it
Our system first transforms a piece of audio into the frequency domain via an FFT to produce a spectrum. The neural network will then analyse the speech features to produce a corresponding, clean speech spectrum. Finally, an inverse transform is applied to the spectrum to obtain an enhanced speech signal. Our neural network was trained on a CD-quality HD speech dataset with 200 epochs and a total dataset duration of 330 hours, enabling high-quality speech enhancement.
Challenges we ran into
We spent a lot of time debugging the structure and hyperparameters of the neural network. We used the GAN model, which was very unstable to train, and we finally got the model to converge after debugging. We spent a lot of effort in tuning the training strategy to achieve satisfactory results.
Accomplishments that we're proud of
Our program works very well. It can record a speech and analyse the characteristics of the damaged speech and visualise it. The enhanced speech is very natural, rich in high-frequency detail and consistent with the hearing habits of the human ear.
What we learned
We learned about building, training and applying neural networks in this hackathon. We recognise that AI has a very promising future to improve people's experience in a variety of ways.
What's next for Speech Enhancement system using GAN
We will try to implement audio processing in real time. And we try to reduce the size of the model so that it can work in, for example, mobile phones or even embedded devices.
Log in or sign up for Devpost to join the conversation.