As students new to online learning, we realized that many of our professors have a difficult time using the trackpad on their laptops or the mouse on their computer to draw diagrams and text on the screen. Smart tablets that make this task easier can be prohibitively expensive, so we decided to come up with a solution that incorporates something many teachers/professors use for classes every day: their camera.

What it does

TracyZ takes in real-time video footage from users' cameras and applies Tensorflow's hand pose library to track the movement of the user's pointer finger in front of the camera. With this movement, the web app draws ontop of PDF files and makes teaching with diagrams more natural and convenient. Users are able to cycle through pages of the PDF and make drawings as needed. To make the user experience more immersive, we have included features such as: speaking "go" to start drawing, and "stop" or clenching your clench fist to stop drawing.

How we built it

The main hand tracking functionality is handled through the TensorFlow.js Hand Pose library. After identifying a user's hand in the view of the webcam, my group highlighted the key points of the pointer finger and drew them with a special red colour on the screen. The x and y coordinates from the pointer finger were then used to draw on the HTML canvas of the PDF that the user previously uploaded. The PDF uploading and managing are handled with PDF.js, and the frontend of the web app with HTML, CSS, and React.js. Tensorflow's Speech Command Recognition was also used for users to start and stop the drawing action on the screen using vocal commands (alternatively, if a user clenches their fist, the TracyZ app stops drawing onto the screen).

Challenges we ran into

While the entire group worked with the same stack (based on react.js), we still had significant difficulty merging different sections of code together. More specifically, the Tensorflow and webcam components of the web app had many build conflicts with the PDF component. To overcome this challenge, team members spent significant time refactoring the code to smoothly integrate everything together. After tracing various build errors through the code, we were able to successfully accomplish the main tasks of our app.

Accomplishments that we are proud of

The majority of members in our group were completely new to Machine Learning, so developing an ML heavy web application was a very exciting accomplishment. Each major milestone, from seeing the hand tracking visualized on the screen to watching lines being drawn on the screen according to our hand movements, made us all excited! We are also very proud of the various features we were able to add to the app. While the drawing functionality isn't perfect, we felt that it was a great demonstration of our idea. Integrating all components of the app into a pleasing website was also a fun challenge.

What We learned

Working with Tensorflow and figuring out how ML models are trained was a great learning experience! Along the way, we figured out more about standard practices with ML models and how best to incorporate them into our projects. We also learnt more about react and javascript through our struggles to incorporate the many different components of our app together. We quickly realized that react is great at starting out web apps and has a lot of preexisting features (ie. webcam, buttons, and nice layouts), but can be a challenge to work with when using combining machine learning and other custom components.

What's next for TracyZ

While our basic line tracing and voice recognition capabilities work as desired, we would like to improve the accuracy of both systems to make the user experience smoother. Reducing the lag on the system overall would also make it much easier for individuals to draw exactly what they intended, without any lines showing up late. Additionally, while the speech recognition works as desired, it can easily get distracted by background noise, an issue we would like to address in future versions of this app (ie. by expanding our data sets). Given time, we also seek to incorporate more gestures (such as raising a hand, undoing actions, and saving), to make the experience more immersive.

Built With

Share this project: