Inspiration

The modern developer's greatest enemy isn't a complex bug; it’s the infinite scroll. We’ve all set out to look up a single piece of documentation, only to find ourselves deep in a YouTube rabbit hole twenty minutes later. Existing "site blockers" are blunt instruments—they either block everything or nothing, and they're far too easy to disable.

We wanted to create Mojo: an accountability partner that doesn't just block URLs, but actually understands what you are doing. We were inspired by the idea of a "Proof of Work" system for human attention, where an AI acts as a guardian of your deep work state.

What it does

Mojo is a vision-aware productivity monitor that sits as a transparent "bubble" on your desktop.

  • Visual Monitoring: It captures screenshots of your active window and uses Gemini 2.0 Flash to analyze the content in real-time.
  • Contextual Awareness: Unlike basic blockers, Mojo can tell the difference between a YouTube video that is a "Python Tutorial" (Productive) and one that is "Gaming Highlights" (Distracted).
  • The Interrogation: When it detects a distraction, it triggers a modal "Interrogation." You must justify your actions to the AI.
  • Enforcement: If your reason is rejected, Mojo uses the Windows API to force-close the distracting application or browser tab immediately.

How we built it

The project is built on a Python backbone with a focus on real-time asynchronous execution:

  • The Brain: We integrated the google-genai SDK to process multimodal prompts, allowing the AI to "see" the screen.
  • The Eyes: We used mss for high-performance screen grabbing and pytesseract for local OCR to provide the AI with extra text metadata.
  • The Muscle: We utilized pywin32 (win32gui/win32process) and psutil to interact with the OS layer, allowing us to identify and terminate specific PIDs.
  • The UI: A custom-styled tkinter interface provides a non-intrusive, "topmost" floating HUD that updates its state based on your focus levels.

Challenges we ran into

The primary challenge was managing API Latency and Quotas. Constant high-resolution screenshots would exhaust a free-tier API key in minutes. To solve this, we implemented a "Heuristic Fallback" system. If we hit a $429$ (Resource Exhausted) error, the system calculates a productivity score based on local keyword matching:

$$P_{score} = \sum (\text{Keywords}{productive}) - \sum (\text{Keywords}{distracted})$$

If $P_{score} < 0$, the system defaults to a "Distracted" state until the API cooldown expires. We also had to handle the transparency of the UI across different Windows themes, which required deep dives into tkinter attribute configurations.

Accomplishments that we're proud of

  • Zero-Bypass Interrogation: We successfully implemented a system that "grabs" window focus, making it very difficult for the user to simply ignore the AI's warning.
  • Multimodal Accuracy: We refined our system prompt so that the AI effectively ignores "system noise" (like the Windows Taskbar) and focuses strictly on the active workspace.
  • Portability: The app is designed to be "plug-and-play," automatically locating Tesseract binaries and managing config files via a GUI popup for easy setup.

What we learned

We learned that multimodal AI is a game-changer for desktop automation. Traditional productivity tools rely on rigid "if-this-then-that" logic, but with Gemini, we could implement "fuzzy" logic that understands intent. We also gained significant experience in multi-threaded GUI design, ensuring that the vision loop—which can be resource-heavy—never freezes the user interface.

What's next for JollyRanchers (Team name)

The JollyRanchers team is looking to expand Mojo into a full-scale "Focus OS." Our roadmap includes:

  1. Local LLM Integration: Using Ollama to run the vision analysis locally for 100% privacy and zero API costs.
  2. Focus Gamification: Earning "Mojo Points" for long stretches of productive work, which can be used to "buy" timed breaks or customize the UI.
  3. Advanced Analytics: A dashboard showing the exact moment your focus dipped during the day, categorized by the specific type of distraction, helping you identify your own patterns.

Built With

  • google-gemini-2.0-flash
  • google-gen-ai-sdk
  • mss
  • ollama
  • pillow
  • psutil
  • pyautogui
  • pytesseract
  • python
  • pywin32
  • tkinter
Share this project:

Updates