Inspiration
Our inspiration for this solution originated from our desire to make computer interaction more inclusive and user-friendly for all. We are motivated to empower individuals with different needs, including those with physical disabilities or challenges that make traditional keyboard and mouse interactions hard to use. We are also both aspiring data science and machine learning students, with a passion for developing innovative solutions to difficult problems so we were excited to come across this idea.
What is it?
SignWaver is a background application that leverages hand gestures and voice commands to control various features on a computer. Users can control various aspects of their system through intuitive gestures. The application recognizes dynamic hand gestures to select actions such as opening specific applications using voice commands. Additionally, users can adjust system settings like volume and brightness through hand gestures. It also enables keyboard and mouse free web browsing by supporting gestures for opening opening, closing and navigating between tabs. The application also supports scrolling, audio muting, and text entry all through voice commands and hand gestures. Furthermore, it includes an image processing feature, allowing users to take "snapshots" of their current screen and generate descriptive text from the images, even allowing for further interaction based on the content scanned.
How we built it
We decided to develop our entire project's codebase through python as it is a highly flexible language with numerous packages with helpful functions. Our gesture-detection model is powered by Google's Media Pipe machine learning model which we have trained to fit specific gestures. We also integrated numerous Google Cloud services to aid in our AI and processing aspect of the project, which includes: Gemini-powered vision processing, Google's Text-to-Speech AI for speech and sentiment analysis for speech processing.
Challenges
-python package management :(
-training model with our own dataset (still a work in progress!)
-debugging & code logic
What's next?
Future updates could see a more accurate model that is fed our own training data. More gesture support can also be implemented, with more complex gestures (maybe even 2-handed ones!). More gestures subsequently leads to more commands with more capabilities and AI support. Future versions may even see an interactive GUI to customize gestures and commands on the fly!
Built With
- github
- google-cloud
- google-gemini-vision-pro
- google-mediapipe
- google-transcription
- natural-language-processing
- opencv
- python
- tensorflow
Log in or sign up for Devpost to join the conversation.