Problem Statement

Since the emergence of doom scrolling, many youths’ attention spans have been permanently damaged, which traditional productivity tools such as timers, trackers, and to-do lists often lack the immediacy and engagement needed to interrupt these habits in real time.

Solution

Our solution directly addresses this crisis by retaining focus through a combination of AI, behavioral tracking, and humor.

There are two ways our app tracks focus:

  1. Window Monitoring: The desktop app tracks your most recent screen, where the window title is fed to AI. If the windows title does not sound study-related, the AI will categorize it as not focused.
  2. Camera-Based Detection: The laptop camera takes a live video stream. The desktop app then analyzes the image with three machine learning models to detect signs of distraction: for example phone use, sleeping, or crying.

When the system flags unfocused behavior, it activates a Gemini API that generates roast prompts. These are delivered both as pop-ups and audio feedback, designed to entertain while snapping the user back into focus.

Inspiration

Our idea was born after watching a 4-hour YouTube video of a man dressed as an Asian mom yelling at viewers to study. Surprisingly, it worked. It made us reflect on how much attention spans have declined, including our own. We've observed a steady erosion of critical thinking and focus in our generation and felt compelled to respond. Using humor, tough love, and AI, this project is our way of breaking those habits and helping students rebuild their discipline, one roast at a time.

How we built it

Machine Learning Server: We built a Flask-based backend that integrates YOLO for phone detection, DeepFace for emotion analysis, and GazeTracking to monitor eye closure and gaze direction. These models return key behavioral indicators—such as has_phone, emotion, and eye_closed—through a structured API. Annotated frames are also generated to support visual debugging and verification.

Real-Time Monitoring with Electron + Node.js: We used Electron to develop a desktop application capable of screen activity monitoring and connecting to the ML server. Built-in logic determines when to trigger alerts like screen flashes or voice prompts based on model outputs. The app sets the foundation for live camera control and real-time user feedback.

Frontend & UI Logic: Designed a responsive chatbot interface using HTML, CSS, and JavaScript, incorporating show/hide textbox behavior and animated effects. We connected the UI to Google’s Gemini API to generate context-aware responses and routed them to the browser’s speech synthesis engine for audio output. A user-facing slider was introduced to adjust the “meanness” of the feedback, allowing users to control the tone of the assistant.

System Integration & Collaboration: Seamless coordination between backend and frontend ensured live communication between the ML server and the desktop app. We used modular development and real-time testing to verify that each component responded accurately to user behavior.

Voice Output Expansion: As the final layer, we began integrating dynamic voice responses based on prompt content and user-selected tone levels. This feature enhanced user interactivity and made the assistant more responsive and lifelike.

Challenges we ran into

One of the biggest challenges we faced was making tradeoffs between ambition and feasibility. Early on, we had plans to incorporate hardware components and explore a broader range of algorithms and features. However, given the limited timeframe, we made the difficult decision to focus primarily on software. This meant scaling back on hardware integration and narrowing our scope to ensure we could deliver a polished, functioning product. While it was tempting to keep adding new models and capabilities, we had to prioritize stability, user experience, and seamless system integration over feature expansion. Balancing innovation with practicality was a constant consideration throughout the development process.

What we learned

This project taught us how to design and build a truly interactive AI system that doesn’t just respond to users, but actively pushes back. Unlike typical productivity tools that quietly log behavior or send passive notifications, our app uses real-time machine learning and humor-driven feedback to actively intervene when users are distracted. We learned how to balance technical complexity with usability, integrating tools like YOLO, DeepFace, and gaze tracking into a workflow that feels seamless to the user. We also discovered how powerful humor can be as a behavioral tool. By feeding distraction data into the Gemini API and shaping it into creative, roast-style feedback, we learned to blur the line between utility and entertainment, making productivity feel less like punishment and more like play. Technically, we sharpened our skills in real-time screen and webcam monitoring, Flask API design, asynchronous desktop–server communication, and speech synthesis. But perhaps the most important thing we learned was how to build something personal. Every roast, every flash, every popup reflects a deeper understanding of how easily focus slips away and how it sometimes takes a little tough love (from an AI that sounds like your mom) to bring it back.

Share this project:

Updates