MadHacks 2025 - Sheet Diffusion

Charles Ding, Marko Kupresanin, Gannon Mefford, Victoria Yang

Inspiration

We were inspired by the lack of free software that transcribes audio files into proper sheet music. This was important to us because musicians who are early in their education/training may have trouble finding, self-curating, or understanding how to create the sheet music for their favorite songs. By creating this app utilizing Fish Diffusion, we enable young musicians to be better connected with music in general.

What it does

Our React.js frontend allows for .mp3 files to be directly uploaded to our backend server, which implements the former Fish Diffusion model that cleverly extracts the pitches observed in the recording. We use this data from Fish to understand the frequency of sounds as musical notes, and process the spacing between notes with OpenCV. From there, we parse all the rests and frequency data to utilize MuseScore's Python SDK, which creates high-quality renditions of sheet music with the necessary time signatures, clef, and accidentals.

Finally, we utilize a React.js library to take the MIDI file we generated from the transcribed audio file and allow the end-user to hear what the sheet music sounds like, so they can make sure to match the correct key and all!

How we built it

We used the Fish Diffusion (https://github.com/fishaudio/fish-diffusion/) framework developed by fish.audio researchers to detect pitch changes and isolate voices and musical instruments as needed for sheet music.

We also used Flask for our locally hosted backend, which allowed us to continue to use the Python implementation of Fish Diffusion and the MuseScore SDK to crunch the numbers and generate sheet music

Our frontend was created with React.js built with Vite, it is styled with React Bootstrap and follows WCAG AA standards for accessibility.

Challenges we ran into

We ran into challenges understanding the Fish Diffusion framework as it was not the typical TTS/STT in fish.audio's current model, but we continued to iterate and read countless documentation pages from Fish and their dependencies. Eventually, we were able to make use of the spectrogram and numpy info returned, and come up with clever solutions like using OpenCV to detect the start and end of notes based on volume as well as pitch. We also ran into issues using MuseScore's Python SDK, music21. Primarily due to the overly complicated wiki page, which spreads out resources across 20+ chapters, making error debugging difficult. Additionally, we ran into issues with MuseScore's entire website being inaccessible and were met with many Cloudflare error screens. This was coupled with the music21 SDK requiring us to specify very specific paths on our machines just for the image generation to work, which was not documented whatsoever in their wiki. This also prevented easily hosting the backend due to the need for the MuseScore app, which has to be downloaded through a third-party marketplace.

Accomplishments that we're proud of

We're all very proud of ourselves and our team. While it was difficult brainstorming our initial Hackathon ideas, and even starting out with our final decision for this submission, we were able to pull through and deliver a fully finished product based on our original intentions. The final product exceeded our expectations during the first few hours.

What we learned

We learned many different things, from how to interpret spectrograms, to using openCV as edge detection, to being resilient with sparsely documented Python packages. We also learned that creating truly perfect sheet music requires more advanced math than we could understand during these 24 hours.

What's next for Sheet Diffusions

We hope to improve even further on Fish Audio's Fish Diffusion by incorporating some of our own agentic AI to contextualize the audio, so we can better understand the eloquence of music, making our sheet music more accurate. For the front end, we might look into VexFlow as an alternative to MuseScore due to its open-source nature and thus less reliance on a 3rd party marketplace's tools. As for the backend, implementing Fourier transforms to decompose chords into their individual notes would be a fun next step.

Built With

+ 1 more
Share this project:

Updates