Inspiration
We wanted to create something unconventional that pushed the boundaries of what's possible with web technologies. People don't generally play 3D games on their browser AND they aren't controlled with their tongues. The idea of using your tongue as a game controller came from a meme. This led us to explore computer vision and real-time face detection, creating this playful twist on the classic tug-of-war game.
The hackathon environment was perfect for experimenting with new technologies we haven't used like MediaPipe and Babylon.js, combining computer vision with 3D graphics to create an immersive, hands-free gaming experience.
What it does
Tuggy Arena is a hands-free, tongue-controlled tug-of-war game that transforms your webcam into a game controller. Players control their character in a 3D arena by simply sticking their tongue out to the left and right and moving their heads. The game features two modes: solo play against an adaptive computer opponent, or face-to-face multiplayer where two players compete using dual face detection in a single camera feed. Real-time computer vision tracks tongue movements, translating them into game actions in a beautiful 3D environment rendered with Babylon.js.
How we built it
Face Detection Layer: We integrated MediaPipe's Face Mesh model to detect 468 facial landmarks in real-time from webcam input, processing video frames at 30fps.
Tongue Detection Engine: Since MediaPipe doesn't directly detect tongue position, we developed a custom
TongueDetectorclass that analyzes mouth region landmarks. It infers tongue position (left, center, right) by calculating lip displacement and mouth opening ratios relative to the face center.Movement Tracking System: We built a
TongueTrackerthat applies smoothing algorithms and confidence thresholds to filter out noise and false positives. AMovementCountertracks directional movements and converts them into game actions.Dual Player Detection: For multiplayer mode, we implemented simultaneous detection of two faces in a single camera feed. We developed position-based face assignment logic (leftmost face = Player 1, rightmost face = Player 2) to ensure consistent player tracking even when players move positions.
Game Logic & State Management: React components manage game state, scoring, and integrate with the adaptive computer opponent. We created custom hooks (
useTongueDetection,useDualTongueDetection) that abstract complex detection logic for better code organization.3D Visualization: Babylon.js renders the tug-of-war scene with stylized 3D characters. Character positions dynamically update based on score differences, creating an immersive visual experience.
Challenges we ran into
Tongue Detection Accuracy: MediaPipe only provides facial landmarks, not direct tongue detection. We had to develop a heuristic algorithm that infers tongue position from mouth geometry. Finding the right thresholds for left/right detection required extensive testing and calibration across different lighting conditions and face shapes.
Dual Face Detection Complexity: Detecting and tracking two faces simultaneously introduced several issues:
- Face assignment could swap if players moved positions
- Cross-detection between players caused false positives
- Performance degradation when processing two face detections per frame
We solved this by implementing position-based face assignment (always leftmost = Player 1) and increasing confidence thresholds in dual mode to reduce false positives.
Real-time Performance: Processing video frames at 30fps while running face detection, tongue analysis, and 3D rendering was computationally intensive. We optimized by using requestAnimationFrame for efficient frame processing, implementing frame skipping when detection wasn't active, applying smoothing algorithms to reduce overhead, and separating detection logic into custom hooks for better React performance.
Browser Compatibility: Different browsers handle camera access and MediaPipe differently, especially Safari. We focused on Chrome/Edge as primary targets and added clear error messaging for unsupported browsers.
Accomplishments that we're proud of
- Innovative Control Scheme: Successfully implementing a tongue-based game controller that actually works reliably, creating a truly hands-free gaming experience
- Dual Face Detection: Building a robust system that can simultaneously track two players in a single camera feed with consistent player assignment
- Real-time Performance: Achieving smooth 30fps gameplay despite the computational intensity of face detection, tongue analysis, and 3D rendering running simultaneously
What we learned
Building Tuggy Arena was a deep dive into several new technologies and concepts:
Computer Vision: We learned how MediaPipe Face Mesh works, understanding facial landmark detection and how to extract meaningful data from 468 facial landmarks in real-time. We discovered the importance of confidence thresholds and smoothing algorithms for reliable detection.
3D Graphics: Working with Babylon.js taught us about 3D scene management, camera controls, lighting, and creating dynamic animations based on game state. We learned how to efficiently update 3D objects in response to real-time data.
Real-time Processing: Optimizing performance for 30fps video processing while maintaining smooth gameplay required careful attention to frame processing, smoothing algorithms, and state management. We learned the importance of requestAnimationFrame and efficient rendering loops.
Dual Face Detection: Implementing simultaneous detection of two faces challenged us to develop robust face assignment logic and prevent cross-detection between players. We learned about spatial reasoning in computer vision applications.
React Hooks Architecture: Building custom hooks (
useTongueDetection,useDualTongueDetection) taught us how to abstract complex detection logic, making the codebase more maintainable and reusable. We learned best practices for managing side effects and state in React.
What's next for Tuggy Arena
- Improved Tongue Detection: Explore using a dedicated machine learning model for more accurate tongue detection, potentially training a custom model or fine-tuning existing ones
- More Game Modes: Add tournament mode, time-based challenges, or cooperative gameplay where players work together, maybe even running games and other usual sports
- Enhanced 3D Graphics: Add particle effects, better character animations, and more dynamic visual feedback
- Mobile Support: Optimize for mobile devices with front-facing cameras, adapting the detection algorithms for different camera angles
- Multiplayer Over Network: Enable remote multiplayer where players can compete from different locations
Built With
- babylon.js
- javascript
- mediapipe
- react
- tailwind
Log in or sign up for Devpost to join the conversation.