Inspiration
We wanted a supervisor that doesn't just block websites, but one that physically and verbally catches you slacking off in real time. Existing tools rely on passive blocking that's trivially bypassed with a single click. FocusOS was built to bridge the gap between digital intent and physical accountability by combining computer vision, generative AI, and cloud analytics into one seamless feedback loop.
What It Does
FocusOS monitors the user through their webcam using facial landmark tracking to detect smartphone-related distraction, specifically head pitch deviation and eye coordination patterns. The moment distraction is detected, it triggers a personality-driven AI voice alert via ElevenLabs in real time. All sessions are logged to MongoDB Atlas, enabling long-term habit trend analysis and productivity history tracking. The interface runs as a custom pixel-art HUD built with OpenCV, displaying live biometric data in a retro-styled dashboard.
How it's Built
The system is built entirely in Python and orchestrates three core components: a vision engine using MediaPipe Face Mesh for sub-millimeter 3D facial landmark tracking, an audio engine powered by ElevenLabs Generative AI for real-time voice intervention, and a data layer using MongoDB Atlas for persistent cloud storage. We implemented a dynamic biometric calibration system that adapts to each user's specific posture geometry on startup, and built a local-first failover to handle unreliable network conditions during live use.
Challenges
Three major obstacles hit us during the build. First, connecting to MongoDB Atlas on unstable public hackathon Wi-Fi forced us to design a robust local-first failover system so data integrity wouldn't break under poor connectivity. Second, ElevenLabs deprecated the specific AI model our audio engine relied on mid-hackathon; however, we resolved this by reverse-engineering direct API calls, bypassing the broken SDK entirely. Third, standard head-pitch math failed across users with different heights and postures, pushing us to engineer a real-time dynamic calibration system that normalizes biometric data to each user's specific geometry within seconds of launch.
Accomplishments & What We Learned
We successfully synchronized three complex APIs : computer vision, generative AI audio, and cloud storage . Developed this into a near-instantaneous distraction-to-intervention pipeline with no perceptible lag. On the technical side, we deepened our understanding of translating raw 3D facial landmarks into actionable posture signals, architecting resilient cloud integrations, and building production-grade API failover logic under time pressure. We also proved that productivity tooling doesn't have to be sterile which is why the pixel-art aesthetic demonstrates that serious software can have a distinct visual identity.
What's Next?
FocusOS 2.0 targets three major expansions: gamified leaderboards where users compete on an "Integrity Score," deep workspace integration to auto-pause Spotify and mute Slack notifications when distraction is detected, and physical deterrents via an Arduino module with haptic buzzers and desk vibration alerts for users who ignore voice prompts. Long-term, FocusOS moves toward a fully adaptive focus OS that learns individual distraction patterns and adjusts intervention intensity accordingly.
Built With
- elevenlabsapi
- mediapipe
- mongodbatlas
- python
Log in or sign up for Devpost to join the conversation.