The need for driver monitoring in autonomous vehicle research has greatly improved computer vision and Human Activity Recognition (HAR). We realized that there was huge opportunity for computer vision in another area of life where focus and concentration are the primary concern: work productivity.
What it does
Tiger Mom uses computer vision to monitor both your screen and your behavior. You leave it on while you study and it will track your screen activity, your physical behavior, and even your ambient surroundings. Its revolutionary approach to sensing allows it to quantitatively learn and suggest actionable insights such as optimal work intervals, exact breakdowns of how time is spent on different distractions, how your productivity responds to the ambient volume/brightness of your surroundings, and can even catch and interrupt you if it notices you dozing off or getting distracted for too long.
How I built it
Tiger Mom's backend is built entirely with Python, with all computation taking place locally.
The computer vision uses DLib to identify facial landmarks on your face, and then solves the PnP problem to compute the pose of your head (direction your head is facing). It also tracks the aspect of the your eyes to detect how open/closed they are. These two facts are used to detect if you are looking away (implying distraction) or if you are drowsy. OpenCV is used to parse video input from the webcam, process images, and display them with visuals overlaid. Numpy and scipy were used for all mathematical computations/analysis.
Screen-based application tracking is done by parsing the title of your active window and cross-checking against known applications (and in the case of the web browser, different websites too). The software tracks a dictionary of applications mapped to timers to track the total amount of time you spend on each one individually.
Ambient noise and ambient light is derived by applying mathematical transforms on input periodically gathered from the microphone and webcam.
Every 10 seconds, the application tracker sends its values to the front-end in JSON format,
Challenges I ran into
For Human Activity Recognition, I originally used a Haar cascade on keras/tensorflow to detect distraction. However, the neural network I found online had been trained on a dataset that I suspect did not include many Asian subjects, so they were not very accurate when detecting my eyes. I thought this was hilarious. This and the fact that Haar cascades also have a tendency to perform more poorly on subjects with darker skin colors led me to pursue another solution which wound up being DLib.
Accomplishments that I'm proud of
- Running an accurate facial pose estimator with excellent visualizations.
- Demonstrating an original and unique use of computer-vision beyond driver monitoring.
- Developing a tool that genuinely creates value for you, and helps you understand and reduce bad study habits.
What I learned
What's next for Tiger Mom
The next immediate step that I wanted to touch was key logging! Analyzing words-per-minute would have been an excellent additional data point. And following that I would have loved to incorporate some sentiment analysis into the computer vision to track your mood throughout your study session. One fun idea to combine these two things as suggested by a mentor, Andreas Putz, was to analyze the sound of your typing with the microphone. For software engineers especially, panic and emotion translate very distinctively to the sound of their typing.
But what makes Tiger Mom special (but also a pain) is the sheer breadth of possible insights that can be derived from the data it is capable of sensing. For example, if users were to tag what subjects they were studying, the data could be used to analyze and suggest what sort of work they were passionate/skilled in. Or if location data were to be considered, then Tiger Mom could recommend what are your best places to study at based on ambient noise and light data of previous visits.
These personalized insights could be produced with some clever machine learning on data aggregated over time. Tiger Mom is capable of quantitatively analyzing things like what times of day you specifically are productive, to exact percentages and times. I would have loved to dive into the ML algorithms and set up some learning mechanisms but I did not have enough time to even build a proof of concept.