Inspiration

Many people want to exercise, but the hardest part is not the workout itself — it’s starting. Time, motivation, self-consciousness, and habit formation all make the first step surprisingly difficult. The initial spark for this project came from an old variety show called “Here Comes the Wall”, where contestants had to fit their bodies into approaching wall shapes. It made us realize that this kind of full-body interaction could be reimagined and brought online with the help of modern computer vision models. Instead of asking users to step outside or go to a gym, we meet them where they already are — at home, in front of a screen — and make full-body movement intuitive, accessible, and fun.

What it does

The player stands where the camera can see him. A virtual wall approaches the player and the player should be in the void of the wall. The Yolo26-pose model detects the key points of the player to make sure that all his key points are located in the void.

How we built it

Walls Incoming is a full-stack, camera-based body-interaction game inspired by the classic “wall coming” concept, rebuilt with modern computer vision and cross-platform technologies. On the frontend, we used Flutter to build a cross-platform game interface that runs on both Web (Chrome) and Windows desktop. The game page integrates the device’s front-facing camera, renders real-time video, overlays UI elements (floating islands, level info, start/result panels), and animates an incoming wall that gradually scales toward the player over time to simulate depth and pressure. On the backend, we built a FastAPI service in Python to handle:

  • Game metadata (levels, difficulty, wall geometry)
  • Score submission and storage
  • Real-time human pose inference We used MySQL to persist level definitions and game results, allowing the system to be easily extended to multi-level progression and leaderboards. For body pose detection, we integrated YOLO26-pose using the Ultralytics ecosystem and OpenCV. When the wall reaches the screen plane, the backend performs pose inference on the camera frame, extracts human keypoints, and checks whether all detected joints fall within the wall’s hollow region. The judgment result is then returned to the frontend and visualized instantly. The entire pipeline—camera → pose detection → geometric validation → game feedback—runs in real time, creating a seamless, interactive experience.

Challenges we ran into

  • Aligning pose keypoints with the wall — YOLO outputs image coordinates that must be in the same coordinate system as the wall hole for “inside/outside” checks; we defined a mapping between camera frame and game overlay (resolution, aspect ratio, normalization) so keypoints and hole boundaries are comparable
  • Integrating and running YOLO on the backend — Large model, heavy dependencies (ultralytics, OpenCV), slow first load and high resource use; we used lazy loading, configurable model path, and clear dependency/error handling to balance first‑request latency and stability
  • Timing the pass/fail check — The check must run once when the wall “fully meets” the screen, not continuously; the frontend triggers a single check when the wall animation ends or reaches a threshold, and the backend runs pose detection and hole‑containment only for that frame
  • API design and frontend–backend alignment — Level structure, wall shape, and score submission had to match on both sides; we agreed on REST endpoints and payloads (level list/detail, score submit) and handled CORS and error responses for direct Web→backend calls
  • Level and wall shape data — Hole shapes (e.g. semicircle) need to be configurable and extensible; we store level parameters (hole type, size, difficulty) in the database, the backend returns them, and the frontend renders the wall and hole from that data

Accomplishments that we're proud of

  • Built a real-time body-controlled game from scratch using computer vision, not controllers or wearables
  • Cross-platform support (Web + Windows) with a single Flutter codebase
  • Successfully integrated YOLO26-pose for live human keypoint detection and visualization
  • Designed a geometric pose-validation algorithm to judge whether the player fits the wall’s shape
  • Achieved low-latency interaction between frontend camera input and backend AI inference
  • Built a scalable architecture that cleanly separates UI, game logic, AI inference, and data storage

What we learned

  • How to integrate real-time computer vision models into an interactive application pipeline
  • Practical challenges of camera access and permissions across web and desktop platforms
  • Trade-offs between model accuracy and inference speed in live gameplay scenarios
  • How to synchronize frontend animations with backend AI decisions for a satisfying user experience
  • Designing systems where AI outputs directly affect gameplay logic, not just visualization

What's next for Walls Incoming

  • More levels & shapes: irregular holes, asymmetric poses, dynamic obstacles
  • Multi-player & co-op modes, including competitive and party gameplay
  • Difficulty scaling based on player performance and pose stability
  • Leaderboards & profiles to track progress and high scores
  • Mobile support (tablet / phone) for casual and social play

Built With

Share this project:

Updates