There were moments when we wished we recorded gameplay, but forgot to start recording. This was made to make sure this never happens again. Also we thought it would be funny if it was triggered by a "pog" reaction.
Starting the program will open a GUI that allows the user to start and stop detecting the user's reaction. Once the start button is pressed, the program will keep track of the last 10 seconds of footage from the webcam and the screen. When enough frames have been collected, the user can "pog" to save this footage and the next 5 seconds. Music will be added in the final product.
The GUI was created using tkinter. The stream captured from the webcam had to be converted from an opencv image to a PIL image and then updated to the tkinter image panel. Two buttons, the start and stop, will control when the user's face is detected. Once the start button on the GUI is pressed, the program starts to continuously take pictures of the screen (using pyautogui) and the webcam footage. These frames are put on top of each other to make a webcam overlay and then temporarily saved to a list. Once the past 10 seconds have footage, it starts to delete the first frame from the list when another is added. Each of the webcam frames is then put into opencv. Haar cascades are used to detect someone's face and then blob detection is used to detect if someone opens their mouth, simulating a "pog". If a "pog" is detected, the next 5 seconds are recorded and then the past 15 are combined into a video. The video is then trimmed so that the "pog" is exactly 10 seconds after the start and 5 from the end. Finally, audio is added to the video and it is saved, using the moviepy.
Originally, to detect when the user's mouth was open, we counted the amount of dark pixels in the mouth area, cropped from the original image. Then, we compared this to the average amount of dark pixels over the past couple of frames. While this worked for a time, it was very inconsistent, and the brightness of the background and the amount of shadow's on the user's face influenced the consistency. We switched to blob detection to detect the oval that the user's mouth would create. After tampering with the parameters, we discovered that this was much more efficient and chose to implement this instead. Additionally, the frame rate between the screen and the webcam were mismatched. In order to merge the two for the final video, we had to find a way to calculate the resulting frame rate so the video would output at an adequate speed.
The fact that we were able to output a video in the first place surprised all of us. For three of us, this was the second hackathon attended. For one of us, it was the first. Each person worked on a different portion of the project individual of the other components, and we merged everything that we worked on in one file at the end. We were able to correctly implement the features that we set out to implement from the beginning and more.
None of use have ever used facial detection or created a GUI prior to this weekend. We learned how to crop images, detect certain shapes, process videos, and create user controls to work with each program component. None of us have worked with so many libraries in one project before. Having to integrate each was a new and intriguing challenge that we enjoyed.
While the GUI serves its purpose, it has trouble recording multiple files in one run. Sometimes, the "pog" isn't synced up with the drop in the music. Ideally, we would like to fix these issues in the future while adding more GUI features to allow a more curated user experience; giving the user control over the placement of the webcam footage, allowing them to include music or record audio from a mic, and switch the theme of the GUI to "dark mode" are a few examples of what we would like to include.
Log in or sign up for Devpost to join the conversation.