This project was developed for the Audio and Acoustic Signal Processing course given at EPFL. The general problem is Blind Source Separation, that is from a mixed signal of multiple people speaking at the same time recorded by multiple mics, demix the signal in order to have clean isolated speech from each person. We restricted our study to a 2 sources, 2 mics deterministic situation.
The project consists of a python implementation of SparseAuxIVA, a method proposed in J. Janský, Z. Koldovský, N. Ono. A computationaly cheaper method for blind speech separation based on auxiva and incomplete demixing transform. Liberec, Czech Republic, 2016.
The main idea is that since speech is sparse in the frequency domain, we can run auxiva only on a subset of frequency bins that are the most significant in this case. This results in an incomplete demixing that can be interpolated using lasso on the sparse relative transfer function between the two mics.
The first part of the work consisted in finding the right frequency bins to work with and tweak the existing auxiva implementation from pyroomacoustics to work with only this subset of frequencies.
Then we had to tackle our main task: reconstruct the full demixing. With a lot of trial and error, we realized that the problem could not be solve using traditional ADMM solvers for LASSO in an efficient way since it had conversion from time to frequency domain in the objective function. A much more efficient algorithm to achieve good results is sparir, used to find impulse response from the sparse relative transfer function between two mics.
The implementation can be found under the link to the github repository. Check out the readme for instructions to find the source code and test files.
A compiled version of the demo notebook is available under the second link.