Inspiration
What it does: It recognizes certain letters out of "M", "L", "H", and numbers from 1 to 5, based on the user's webcam input.
How we built it: We trained our own convoluted neural network that recognizes posture from the picture of a hand. We used Google Mediapipe's Hands API to identify where the hand is inside the video cam, then crop that area out and feed it into our recognition network that estimates the result. The app itself is built with React.
The numbers are recognized using a custom gesture recognizer called "fingerpose" that is available online. It takes the position data from finger joints (which is provided by Mediapipe Hands) and matches it to a range of custom-defined gestures that we created.
Challenges we ran into: Training a neural network and deciding how many nodes each layer should have, as well as how many layers and what types of layers. Integrating different posture-related apis together and processing image with the js canvas element.
Accomplishments that we're proud of: We are able to distinguish numbers pretty cleanly.
What we learned It is good to compare different APIs that accomplish the same goal, and see which API offers the best performance and clarity.
What's next for ASL sign recognizer with computer vision: We could possibly improve accuracy and incorporate more letters and numbers than just "MLH12345".
Built With
- javascript
- node.js
- python
- react
- tensorflow
- yarn
Log in or sign up for Devpost to join the conversation.