Inspiration
We all wish we were better at multitasking. How are you supposed to control your device with busy hands?
What it does
Sublime gathers video and speech input to translate user actions into device actions in real time.
How we built it
We trained deep learning models to recognize similarity between hand gestures, and used speech-to-text models to identify trigger words from user voice.
Challenges we ran into
Initially, our idea was to build a desktop app with swift, but a lot of the packages we were using in python could not be moved over to swift. Additionally, it was very difficult to build a model that had an inference time low enough for real time tracking and prediction. This applied to both our gesture prediction and speech recognition models.
Accomplishments that we're proud of
We're proud of being able to integrate all of portions of our pipeline into a cohesive product with good functionality. Additionally, we're proud of training and iterating on our own model that achieved high performance.
What we learned
We learned that packages such as mediapipe couldn't be used on swift, making it difficult to build a desktop app. Additionally, we learned how to use mediapipe and integrate it with opencv, how to build a functional UI in pygame, and how to do real time TTS.
What's next for Sublime
We would like to improve the accuracy of our model and recognize gestures faster. Additionally, we hope to reduce the latency of our entire pipeline to make it easier to click different items.
Log in or sign up for Devpost to join the conversation.