Music Matrix was envisioned by the Musing Live team during the previous Wallifornia hackathon NextStageChallenge, when we wanted to create an interactive live music environment that was truly in the intersection of immersive live music and gaming.
For this we started to create a simple rhythm game in Unity and the state-of-art music information retrieval library Essentia as part of our hackathon solution, but ended up not using it then, as we wanted to focus then on our existing concert producing collaborations with several established live music arrangers in Sweden.
Inspired by neuroscience-concept of “embodied cognition”, and the potential to fully utilise machine learning for interactive personalised user experiences, we wanted to create an immersive music world generated purely from perceptual musical features of live music audio.
At a live concert your experience is often shaped by the way you interact with the people and the music and the space it’s in, the energy and movement of the music in the moment together with others in a social safe space is often the core value of the live music experience.
We wanted to build a virtual interface for a live music experience that only needs as input an audio signal in order to generate a whole virtual world created entirely out of musical features.
This should be available as a stand alone experience across all common devices directly via the browser as a progressive web app, with a further scope to integrate the same environment/system in other VR-environment, e.g. for virtual concerts in games such Fortnight.
What it does
As musician/artist, you can plug the output from any live performance into the Music Matrix app, where it automatically generates a virtual environment purely from the live audio signal, where your online audiences can interact with the music. The graphic style of the generative environment accommodates the musical style of the live audio. (by matching retrieved style/genre metadata to extensive image data sets, and utilise style transfer learning to generate the graphics.)
For online audiences to the live streamed show, an interactive immersive environment is generated in real time purely from the audio signal from the live music performance. The environment is reactive with the music and evolves with the live audio, and also evolves with the movements from the online audience users to a live show. This by capturing the users movement while listening to the music via the web camera, to embody their dance-style/rhythm in their own avatar interacting in the virtual environment.
How I built it
This is achieved by utilising state-of-art music information retrieval (MIR) technology to retrieve perceptual musical features from the live music audio signal, e.g. the beat and the rhythms of the music. The users movements are captured by existing motion tracking/pose estimation software, and the rhythm of the movements are matched with the music features, to seamlessly embody the movements of the users avatar in the evolving immersive environment in an interactive way.
Then we are extensively using machine learning (ML) (in particular deep transfer learning with sequential models and GANs), in order to learn users movements to the music, and feed the output from this ML-model back to the user as interactive dance/rhythm-games. In this way the user is actively interacting with the music, and the evolving environment, in a kind of perceptual feedback loop that enhances the immersive live music experience.
Generative graphics in Unity and Webgl. Sound analysis/MIR/ML with Essentia, MADMOM, with custom extensions built on TensorFlow and/or Pytorch, hadoop infrastructure for data capture and handling. By utilising state of art transfer learning, the online computation of live features can be made very efficient and “real-time” UX-wise, if a stacked hierarchy of deep learning models are sharing parameters, and at the top a very “shallow” neural net is actively generating content interactive to the user.
Challenges I ran into
For the hackathon, we quickly realised that it was infeasible to within a few days build a ML-system indicating the above described functionality. What we instead did was to analyse the requirements and feasibility of building such system long term, and this seems to us to be very possible given time and resources to our in-house machine learning and MIR team.
It is an engineering challenge to sync streaming data across users devices to provide a good user experience, and this takes some more R&D to settle a feasible scope for. For this we are actively collaborating with the expert live streaming consultancy firm Eyevinn technology located in Stockholm.
Accomplishments that I'm proud of
We set out to do this hackathon project in order to develop and validate our idea/concept by a design process with our team and interactions with the hackathon mentors and coaches. During this process we received lots of good feedback on our concept, validating that there is industry interest for our envisioned product.
We took a quite technical and machine learning- based initial idea and managed to developed a concept around it, which many industry experts seems to agree with us will provide great value to live music audiences and music content providers if it’s being fully implemented.
This is for sure motivating us to continue developing and scale up this project after the hackathon. Especially added to the fact that this concept is right in the heart of music and technology which is something we are deeply passionate about and, we would love to see it being realised.
What I learned
We participated in the Hackathon since we were excited about the idea and wanted to take the opportunity to develop it into a coherent concept and start validating it by interacting with mentors and coaches.
We have received a lot of very nice feedback on the concept so far, which is highly encouraging for us to develop and scale this project up now.
The Essentia team, of MIR researchers and developers, was consulting us directly during the Hackathon, and was also very encouraging of our project. In particular our scope to effectively utilise machine learning and predictive analysis to improve their existing benchmark algorithms for live music feature retrieval.
In these interactions we learnt a lot from them on how to achieve high performance of these algorithms when running them via the web browser.
What's next for Music Matrix
As a next milestone we are aiming to create a unique interactive app/interface for live music experience as MVP per the above, to be used within our live concert activities with our partners that are professional live music production agencies (in particular Abundolive.se, MTA production, Jubel AB). These activities will then also generate specialised training data for our system that can incrementally improve the systems performance over longer term.
If such system is employed within live music events with major artists (in e.g. Fortnight), this will result in a highly specialised dataset of perceptual music features, live music interactions and other music metadata, that will surely be an unique and highly valuable asset for the music industry.
For example this data set could be utilised in music recommendation systems, and also potentially for generating new music content with machine learning that is of much higher quality as its features are significantly closer to the human cognition/perception.