Introduction
I've always been really big into music. So, for my first hackathon, I wanted to try to do something relating to that. I initially wanted to build something complex like an analog synth, but I realized that a project of that caliber is not particularly innovative, nor is it a viable project in such a short time span.
Starting this hackathon, I was in a group with 3 other people, we decided to start on a VR related project. I was excited at first, but after actually working on VR, I realized that this isn't where my passion lied. I made the decision to pivot away from the VR project and work on something I was more passionate about. I still stayed and worked beside my original team members though, but while they were finalizing and debugging the VR project, I just began working on my own. I only began conceiving this project at 6pm on Saturday, so time was of the essence. At the time, I had no idea what to make, but became inspired by the oculus quest hand tracking, and eventually came up with this idea!
What it does
If you're unfamiliar, solfege is a really popular system for organizing musical pitches using syllables like "do," "re," and "mi". In the choir world, each of these syllables also has a designated hand symbol. Using your webcam, the Solfeger tracks your hands and interprets their gestures as Solfege notes. At the same time, these notes can be played with your keyboard. The hand tracking determines the pitch, and the keyboard triggers that particular note to sound.
The building process
The Solfeger was built in JavaScript. I used google's mediapipe to help with the hand tracking, and used the tone.js framework to play the sounds. I was originally thinking about doing it in java, but after briefly looking into mediapipe and some audio frameworks, I realized that a project like this would be much easier and faster to implement on the web. Also, it would give me a good chance to brush up on my html and javascript.
One of the interesting challenges for this project was interpreting the hand data. All mediapipe does is provide point data for "landmarks" on hands. Landmarks are the places like knuckles, joints, and fingertips. But, it doesn't interpret it for you. I was having difficulty figuring out a way for the system to accurately determine the hand shape. Each of the solfège hand shapes are pretty distinct from each other, so I had to use different methods for each of them.
One method I used was to group landmarks into arrays based on their position in the hand, such as all knuckles in one array and all fingertips in another. For each array, I averaged y values, and used that in some of my calculations. This simplification proved to be very useful with 4/7 solfège shapes. I also calculated the distance between the average height values, which proved to be very useful as well.
Conclusion
This project was a lot of fun to work on. I'm so happy I got to mesh my passion for music and audio, while learning how to use the hand tracking system. It was surprisingly intuitive. I've never used something of the sort and and made me realize how accessible such advanced tools are for regular people. I really hope to do more with midiapipe in the future, and learn some other cool apis/frameworks as well!
Built With
- javascript
- mediapipe
- tone.js

Log in or sign up for Devpost to join the conversation.