Inspiration

Our project is inspired by the increased demand for creative and personalized background music relative to short-form content. Understanding the current competitive market in the content creation industry, the ability to generate royalty-free music that aligns with a creator’s personal aesthetic and visual theme can greatly elevate their appeal. On top of that, more than just benefiting content creators, we also want to empower artists who seek to take the concepts and emotions present in images and use them as rich sources of musical inspiration. We believe that AI-generated music has reached newer levels of sophistry, as demonstrated by the viral “Fancy Pants Rich Mcgee” AI-generated song. For these reasons, this project is aiming to explore both technology and art, using innovative technology to make personalized music that reflects one’s emotions and individuality accessible to every person.

What it does

The model Audiolux.AI takes the approach of using Generative AI to pair images or snapshots from short-form content with matching musical composition. First, the model takes an image of choice and uses gen AI to produce a descriptive text of the image, including the overall atmosphere, theme, and emotions present. This description is then used as the input to another gen AI model, where a musical piece is generated for the user based on the given description of the image. The artist also has the option to include additional details that further customizes the musical piece generated. Then, the artist has the option to download the piece for creation or retry the model to produce an alternative music sample. Overall, this project is a tool for content creators or artists that are looking for a way to facilitate a connection between visuals and personalized music.

How we built it

We used React and Tailwind CSS to create the website and UI, then used the Azure computer vision API to process and generate text from the image. For the backend, we used Azure Blob Storage API to set up a cloud storage backend, which stored the user input. Finally, the generated text was directed to the MusicGen API to generate music, which was coded using Python.

Challenges we ran into

One challenge we ran into was that Azure’s computer vision API tended to capture images in a way that was very literal, instead of capturing the tone or overall mood that we wanted. For example, when we fed in an image of a smiling person, the computer vision API returned something like “person” instead of “happy” which was not ideal when we intended on using it to generate music. We looked into a solution involving training our own models to detect emotions, but due to time constraints, we decided to continue using the pre-trained model. Moreover, we faced challenges searching for, accessing, and learning how to use Gen AI APIs to serve our goals.

Accomplishments that we're proud of

In the process of this project we encountered many challenges, and we are proud of all of our achievements in the process of development. In particular, we are proud of the successful progress we made with Gen AI API and tailoring it to meet our needs. We were able to obtain a deep understanding of the resource and incorporate it in our project. Firstly, we were all pretty new to full-stack development, so setting up an Azure blob storage backend, Azure’s computer vision API, and linking it to the Python backend were all key learning points for us. We also encountered many difficulties ranging from having to completely switch our choice of APIs, expiring access tokens, and linking the frontend with the backend, but we were all very supportive of each other and collaborated to solve the problems together.

What we learned

We learned how to use React to create the frontend, as well as how to work with APIs using documentation. We also learned more about generative AI models, such as how they are utilized and how they can be customized.

What's next

If we could improve this project, we would likely try to add the functionality of uploading a video and selecting snapshots to generate music from. In addition, we would want to add the ability to generate music with lyrics, as there are models capable of doing so.

Built With

Share this project:

Updates