Inspiration
As musicians, we often want to perform, hear, or edit sheet music, but at best can only find low-quality sheets that someone else has scanned. Cleaning and digitizing sheet music is a time and labor intensive task that can be automated with Optical Music Recognition tools like Audiveris found on sites like musescore.com/import. However, we find that OMRs can be inaccurate at times, which led us to use Picsart’s high-quality image pre-processing tools, which can clean up images for more accurate results.
What it does
EMD takes in a low-quality sheet music scanned pdf and returns a high-resolution cleaned sheet music pdf. In addition to making readability better for the performing community, which would be interested in simply having clearer sheet music to perform with, we also benefit the composer community by giving them a high quality input into Audiveris and other such OMRs. Because OMRs are often bottlenecked by the quality of the input scan, by giving the ML-based pdf to music xml converter a higher quality input, we can reduce the load on the ML side. In other words, instead of trying to make a better algorithm, we are giving the algorithm higher quality data to work with. This is done by implementing a number of Picsart’s features, including the powerful “enhance/upscale” tool to increase image resolution, and by tweaking parameters such as clarity, contrast, and sharpening in the “adjust” tool. As a short example, most music scans are improved by maxing out contrast and denoising grainy sections of sheet music to more effectively pinpoint what is “sheet”, and what is ‘music’ is to Audiveris.
How we built it
We built our web-app using a react.js front-end and flask/python back-end. The python backend pipeline starts by taking in an input pdf file and splitting it into a list of pngs. If the update flag is specified, we then first run all of the picsart update operations (clarity, contrast, sharpness). Then we upscale the pages. Finally, we stitch the list of cleaned and upscaled pngs back into a pdf and send it back to the frontend.
Challenges we ran into
Originally, after running the input pdf through the picsart pipeline, we intended to then submit that resultant pdf into Audiveris, an open source pdf to music xml application, which would then convert our cleaned pdf into a music xml file for input into music notation editing software such as MuseScore. Unfortunately, Audiveris turned out to be a lot less open source than we initially thought. The core functionality is compiled into a Java based application, and we couldn’t find a reasonable way to access it through a Python backend. Thus, for the use case in which the user is interested in converting the cleaned sheet music into music xml format, we had to settle for only doing the pre-processing half of the original intended pipeline and leaving the user to take the resultant pdf and manually enter it into MuseScore’s pdf to music xml service. However, this change shouldn’t impact the users who are only interested in increasing the visual readability of the pdf only and have no music composition intentions for the sheet music. In other words, performers won’t be impacted but composers will have a minor additional user step to take.
Another place where we had to compromise quality for workability was in the order of operations for our upscale + update pipeline. Originally, we found that upscaling first and then applying update (clarity, contrast, sharpen) to the upscaled intermediate image had much better final results than applying the update to the original image and then upscaling the updated image. However, we found that the picsart api is flakey on input images over 5 MB in size for specifically the update API call. Despite the picsart API claiming to not increase file size on upscale api calls, we noticed that the 1 MB individual pages of input sheet music ended up at 12 MB or so after upscaling by a factor of 2. Thus, picsart would non-deterministically fail when taking in upscaled images as input into update, so we had to switch the order around to avoid this. This compromises on resultant quality, since now instead of upscaling into high res images and cleaning those high res images, we now clean blurry images and upscale the cleaned blurry images.
Accomplishments that we're proud of
Finding the best general settings for creating cleaner music sheets through hours trial and error yielded exceptional results. Creating a script to automate these settings for a wide variety of low-quality sheet music scans will save many people many hours of tedium.
What we learned
We learned how to create API calls and responses, integrating and managing interactions between our front and back ends. There are a lot of tutorials online for writing flask backend and writing react frontend individually, but integrating the two was a little less documented. There was a lot of stitching together different tutorials that were not intended to work together in order to create an end to end functioning product.
We also had to deal with the technical limitations of the API itself. For example, the update api call had issues with large file size inputs, so we had to adjust our pipeline to call upscale after applying the update call to avoid calling update on enhanced images. We obviously don’t control the source code behind the picsart API, so we have to adapt by changing the source code of our application, which we do control.
Another important thing we found out was the process of narrowing down the list of relevant api calls for our specific task. Certain api calls such as remove background or enhance/ultra would be super helpful if for example dealing with photos of people. In addition, other calls such as vectorize and add_effect work very nicely for artistic ventures. However, scans of sheet music is a much different domain; there’s no background to remove and enhancing too far makes certain symbols and notes hard to distinguish; in addition, we are interested primarily in clarity and very little in artsiness. Thus, the trial and error process of figuring out which api calls have a positive improvement on our application’s specific needs was a learning experience for us.
What's next for Enhanced Music Digitizer
Currently, the user can upload a sheet music pdf and receive a cleaned-up pdf that they themselves must put into an OMR tool. Eventually, we would like to have full end-to-end pipelining with Audiveris to directly output the musicxml file for the user to download, keeping the user’s experience on one site and having the features stay under-the-hood. Ideally, EMD will be hosted publicly on AWS for all to access, with a queuing system so that anyone anywhere could use it at any time. However, we did not have time to implement this, so we focused instead on making sure it was easily reproducible on a local setup.
Built With
- flask
- javascript
- picsart
- python
- react
Log in or sign up for Devpost to join the conversation.