Desk Potato

Inspiration

As CS Majors, we spend a lot of time in front of our computer screens. Spending long hours at our desk got us thinking of ways to maximize our health in front of the screen. Desk Potato is our way of monitoring posture, and eye strain to make sure that the time we spend in front of the computer is spent right, not cultivating injuries.

What it does

Our app constantly monitors your posture through the camera lens and alerts you when your posture is wrong, as calculated by the positions and rotations of different body parts and the strain of the long-term effect of staying in those positions. We display metrics like eye strain, neck strain, and face angle to help you actively see and correct your posture. We also track the metrics locally over time, so you can see your posture improve and where your posture needs to improve.

How we built it

Building an app to track posture using nothing more than the camera was a bit of a challenge. We currently use state-of-the-art depth segmentation models, more specifically depth-anything v2, which is a large vision transformer trained on 62 million labeled depth images. We then use facial segmentation models to identify different parts of the face, chest and other body components before mapping each part to a plane of closest fit, and calculating the roll and pitch of that plane.

Using the roll and pitch of the points on your different body parts, approximated by those reference planes, we then calculate the eye strain, based on the distance of your eyes from the screen, and neck strain based on the pitch of your face. We then use chest roll, eye strain, and neck strain to calculate if your posture is acceptable for long-term sessions in front of the computer of not. We then use voice-to-text AI to convert the notification message to audio to alert the user.

Challenges we ran into

During development, we faced a few integration challenges connecting our real-time dashboard API with the front-end. Initially, our attempts to pass posture data as JSON to the dashboard failed, so we experimented with several data-transfer approaches. Eventually, we implemented a temporary dummy pipeline to simulate the data flow and successfully got the system running.

We also ran into CORS (Cross-Origin Resource Sharing) issues that prevented our internal servers from communicating properly due to firewall restrictions. To overcome this, we unified all services under the same domain and explicitly configured CORS access permissions, and solved the problem.

Accomplishments that we're proud of

One of our biggest accomplishments was getting real-time posture analysis running smoothly using only a standard webcam—no depth camera or additional sensors required. We’re especially proud of how seamlessly LiveKit handled live video streaming and how well Depth Anything v2 integrated with our segmentation and strain detection pipeline.

We also successfully built an AI-driven feedback loop that feels personal and responsive. With Letta AI generating natural-sounding notifications and ElevenLabs delivering them through expressive voice alerts, we turned static posture feedback into a more engaging, interactive experience.

Lastly, we’re proud of our teamwork and perseverance—especially troubleshooting data pipeline issues and optimizing our models under tight hackathon time constraints.

What we learned

Throughout the development process, we learned a lot about the complexity of combining computer vision, real-time communication, and AI-driven feedback into a seamless user experience. Integrating LiveKit taught us how to optimize for low-latency streaming while maintaining high-quality video for accurate posture detection. Working with Depth Anything v2 gave us insight into how powerful modern vision transformers can be, but also how important proper preprocessing and calibration are when estimating depth from regular RGB cameras.

We also discovered the value of context-aware AI assistants through Letta AI, which allowed us to generate dynamic, personalized notifications that adapt to the user’s behavior instead of relying on static messages. Combining that with ElevenLabs’ voice synthesis helped us understand how multimodal feedback—visual, textual, and auditory—can significantly improve user engagement and response times.

Most importantly, we learned that even small adjustments in feedback design and posture metric thresholds can make a big difference in user comfort and usability. Building this system reinforced the importance of balancing technical precision with a smooth, human-centered experience.

What's next for Desk Potato

Next, we plan to extend Desk Potato beyond simple posture monitoring into a comprehensive workspace health assistant. Our immediate focus is on improving the accuracy of our depth segmentation models through fine-tuning and better calibration across different lighting conditions and camera setups. We also want to expand the range of health metrics we track—such as shoulder symmetry, sitting duration, and micro-break reminders—to give users a more complete understanding of their ergonomics and habits.

In addition, we’re exploring the development of a cross-platform companion app that syncs posture data and analytics across desktop and mobile, allowing users to view their progress and receive reminders wherever they are. To make the experience more engaging, we plan to incorporate gamification elements, rewarding users for maintaining good posture streaks and achieving wellness goals. Finally, we’re looking into browser-based implementations using WebAssembly for seamless, real-time posture tracking directly in the browser without requiring heavy local installations.

Ultimately, our goal is to make Desk Potato not just a posture correction tool, but a holistic platform that helps people build sustainable, healthy screen habits for the long term.