We are submitting our project to the education track.


Due to the COVID-19 pandemic, most K-12 schools and universities have had to move to virtual learning. Many students, especially those with ADHD and learning disabilities, are struggling with the more independent, challenging learning environment. 31% of parents of kids with ADHD described remote learning as “very challenging” and struggled to support their children at home. Long online lectures require students to maintain focus for long periods of time, and without normal visual cues, teachers can’t support their students as effectively. Because many K-12 schools have reduced class time, even small distractions can lead to students missing important topics.

What it does

Keep-Focus starts with OpenCV reading frames from a webcam video feed. The video feed is continually passed into Google’s Vision AI, which returns a list of detected facial features with associated probabilities. Our pipeline tracks user engagement through a custom scoring function incorporating emotion and head-pose detection. Additionally, Keep-Focus provides accessibility options in full desktop audio speech-to-text & keyword extraction using Google Cloud’s Speech-To-Text API and Google Cloud’s Natural Language text classification analytics. At the end of a study session or Lecture, user engagement is plotted over time, which can be saved as an image. This information can be used to track topic engagement.

How we built it

Keep-Focus was created using Python, OpenCV, Google Cloud, Google’s Vision API, Google’s Speech-to-Text API, and Google’s Natural Language API.

Challenges we ran into

It was challenging to come up with a scoring system incorporating head-pose. Though some were familiar with 3-Space, none of us were familiar with axial coordinate transformations, making a scoring metric incorporating head-pose difficult to conceptualize. It was nearly all of our team members’ first interaction with Google Cloud computing services, and it was quite a bit of a learning curve to understand what to do and why we were doing it. Additionally, we initially planned to make a zoom-extension but could not obtain Zoom API keys in time. We struggled to learn how to interact with these APIs; however, they proved exceptionally useful, and we managed to learn a lot about cloud computing and 3-Space.

Creating a site with Flask proved to be difficult, since none of us had previously used Flask before. In the end, we were able to utilize MLH’s Flask starter code to create at least a basic website working, but we never had time to integrate.


We came up with a new, plausible solution to a problem that many students face in online settings. To do so, we implemented Google Cloud Vision AI with a custom score function for real-time engagement detection. We implemented Speech-to-text audio transcription with feature extraction for accessibility on desktop audio, allowing limitless study applications.

What we learned

Most members of our team had our first experience in Cloud Infrastructure working on this project. We all learned about the Google Cloud platform and about utilizing multiple Google APIs. Flask was also a new framework for us, so a lot was learned about it as well. In addition, some of us had never worked on such a collaborative project as this, so we built up our collaboration and communication skills.

What's next for Keep-Focus

  • Seamless integration with Zoom through creation of an app on the Zoom App Marketplace.
  • Cross-correlation of key features from transcribed text and engagement scores for topic retention tracking.
  • Full cloud deployment using Firestore and Google Kubernetes Engine internationally.
  • Database management for teacher integration and class wide deployments
Share this project: