Sublime

Inspiration

We all wish we were better at multitasking. How are you supposed to control your device with busy hands?

What it does

Sublime gathers video and speech input to translate user actions into device actions in real time.

How we built it

We trained deep learning models to recognize similarity between hand gestures, and used speech-to-text models to identify trigger words from user voice.

Challenges we ran into

Initially, our idea was to build a desktop app with swift, but a lot of the packages we were using in python could not be moved over to swift. Additionally, it was very difficult to build a model that had an inference time low enough for real time tracking and prediction. This applied to both our gesture prediction and speech recognition models.

Accomplishments that we're proud of

We're proud of being able to integrate all of portions of our pipeline into a cohesive product with good functionality. Additionally, we're proud of training and iterating on our own model that achieved high performance.

What we learned

We learned that packages such as mediapipe couldn't be used on swift, making it difficult to build a desktop app. Additionally, we learned how to use mediapipe and integrate it with opencv, how to build a functional UI in pygame, and how to do real time TTS.

What's next for Sublime

We would like to improve the accuracy of our model and recognize gestures faster. Additionally, we hope to reduce the latency of our entire pipeline to make it easier to click different items.

Built With

mediapipe
pyautogui
pygame
python
tensorflow
whisper

Updates

Christopher Sun started this project — Jun 23, 2024 02:36 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.