Inspiration
Working on our own university coursework, after spending hours staring at broken camera feeds, we started wondering if we could make use of the data captured in another way. The idea of AirGestures arose out of this.
What it does
The platform consists of an iOS application and a MacOS server, with a USB cable connecting the two devices. The user can hold their hand over the phone and use a set of gestures in the air to manipulate a virtual ball, which in turn causes actions to be triggered on their computer, such as navigating through slideshows, or triggering music controls.
How we built it
Pretty much the entire codebase was written in Swift4. We used Microsoft's Cognitive Services Custom Vision API to generate a CoreML model from training data, then we query the camera feed against the model and detect the gesture the user is making. We use the peertalk protocol to communicate data about the gesture to the client, where it triggers relevant OS functions. We defined a series of commands that could be representative of some of many possible applications of our technology.
Challenges we ran into
We struggled with our CV model a lot. This was mainly solved by more training data, but we eventually came to the issue of trying to do object detection and trying to track the user's hand across the screen. Unsatisfied with libraries we found online, we decided to write our own object detection library from scratch by taking a background image and subtracting it from an image with the object in it. This actually worked, but we found it was too processor intensive to run effectively on device.
Another struggle was getting the iOS and MacOS devices to talk to each other. At first we used the Apple Multipeer Connectivity library for this, but after facing connection issues we discovered peertalk, which gave us the quick, ease of use we desired, with the added reliability of transmitting data over USB rather than wirelessly.
Accomplishments that we're proud of
- We created comprehensive machine learning models in less than 24 hours
- Building a beautiful UI that is intuitive and has a lot of affordance.
What we learned
- 2 of the team had never worked in Swift/iOS/MacOS before
- How to use the Cognitive Services Custom Vision API to generate ML models
- Cross platform connectivity protocols
What's next for AiRGestures
The product could be expanded on to any number of applications by changing the OS functions called by the server, and by adding more gestures to the app. We will be open sourcing the project and providing an easy to use API to allow anyone to add their own functionality.
Log in or sign up for Devpost to join the conversation.