Presentation Gestures Through Computer Vision (PGTCV)

Inspiration

With the sudden move to online videoconferencing, presenters and audiences have been faced with a number of challenges. Foremost among these is a lack of engagement between presenters and the audience, which is exacerbated by a lack of gestures and body language. As first year students, we have seen this negatively impact our learning throughout both high school and our first year of University. In fact, many studies, such as link, emphasize the direct link between gestures and audience engagement. As such, we wanted to find a way to give presenters the opportunity to increase audience engagement through bringing natural presentations techniques to videoconferencing.

What it does

PGTCV is a Python program that allows users to move back from their camera and incorporate body language into their presentations without losing fundamental control. In its current state, the Python script uses camera information to determine whether a user needs their slides to be moved forwards or backwards. To trigger these actions, users raise their left fist to enable the program to listen for instructions. They can then swipe with their palm out to the left or to the right to trigger a forwards or backwards slide change. This process allows users to use common body language and hand gestures without accidentally triggering the controls.

How we built it

After fetching webcam data through OpenCV2, we use Google's MediaPipe library to receive a co-ordinate representation of any hands on-screen. This is then fed through a pre-trained algorithm to listen for any left-hand controlling gestures. Once a control gesture is found, we track right-hand motion gestures, and simulate the relevant keyboard input using pynput in whatever application the user is focused on. The application also creates a new virtual camera in a host Windows machine using pyvirtualcam and Unity Capture since Windows only allows one application to use any single camera device. The virtual camera can be used by any videoconferencing application.

Challenges we ran into

Inability to get IDEs working. Mac M1 chip not supporting Tensorflow. Inability to use webcam in multiple applications at once. Setting up right-hand gesture recognition with realistic thresholds.

Accomplishments that we're proud of

Successfully implementing our idea in our first hackathon. Getting a functional and relatively bug-free version of the program running with time to spare. Learning to successfully work with a number of technologies that we previously had no experience with (everything other than Python).

What we learned

A number of relevant technologies. Implementing simple computer vision algorithms. Taking code from idea to functional prototype in a limited amount of time.

What's next for Presentation Gestures Through Computer Vision (PGTCV)

A better name. Implementation of a wider range of gestures. Optimization of algorithms. Increased accuracy in detecting gestures. Implementation into existing videoconferencing applications.