As humans, we intuitively use our hands for a variety of things. Gestures are easy to learn, easy to recognize, and easy to remember. We made this product aiming to simplify the user experience on a variety of electronic devices. Ranging from TVs to smart mirrors to laptops, hand gesture recognition allows us to interact with these devices around us in a unified manner. One of the main criteria that we had for this product was that it should be easily integrated into a preexisting system. Thus, we restricted ourselves to cheap webcams and mid-tier laptops.
What it does
Wave sits in the background as you use your device. It recognizes gestures to perform relevant tasks based on your activity. It runs completely locally on your device, so your information is kept private, and stores no identifying data anywhere. Although this beta version runs only on a Mac, the system can be easily expanded to support other devices as well.
How we built it
The core of this product is a pair of neural networks working together to perform fast and accurate gesture recognition. A single-frame model processes individual frames in parallel to reduce latency and power usage, while a more powerful multi-frame model takes the encoded outputs of the single-frame model and performs gesture inference.
As for the control of the system, we (learned and) used PyAutoGUI and AppleScript. The two frameworks allowed us to programmatically control various aspects of the operating system, and we used a finite state machine to guide the model's predictions with the correct context.
Challenges we ran into
The first challenge we ran into was managing the wide range of skillsets on the team. Some presented ideas and some learned technical aspects. All in all, this Hackathon allowed for us to grow as a team as well as individual programmers.
Secondly, we learned the difficulties of trying to integrate multiple languages into one project. We initially tried to use C++ as it's a common language known among all the team members; we quickly encountered difficulties regarding inter process communication between Python and C++ as well as managing a cross platform multiprocessing system. In addition, we decided that it would be unfruitful to attempt to learn a new GUI platform (Qt) as it would severely slow down our progress. Thus, with time as a high priority, we settled on a pure Python project.
Accomplishments that we're proud of
One feature of Wave was making it context aware. By using finite state machines, we are pretty proud that this product can contextualize gestures based on past actions.
Another area of interest is that the model performs hand gesture recognition with a fairly high accuracy. Considering the wide range of skin color, hand size, background, and degrees of freedom, hand gesture recognition is harder than it may seem. To humans, this skill comes naturally as our brains are built to easily reason in 3D space. However, for a computer, this task is a difficult one.
What we learned
This Hackathon taught us each different things. For some members, we learned Python, Git, and various frameworks. For others, we learned product management and coding collaboration.
What's next for Wave
The next step for Wave would be to port it to other operating systems, especially embedded devices. For example, smart TVs can do away with remotes, and smart mirrors can become more interactive. Another area of improvement would be to prune the models to reduce their size and complexity, which would ultimately improve latency and power consumption and these low end devices.