Real-Time Object Detection and Distance Estimation Using WebRTC and AI
Overview
This project enables real-time object detection and distance estimation from a live video stream using WebRTC and AI models such as YOLOv8 and MediaPipe Pose. The system processes video frames, detects objects and humans, estimates their distances, and transmits the results via a Socket.io server.
Key Features
- WebRTC-Based Video Streaming: Uses
aiortcto stream video in real-time. - YOLOv8 Object Detection: Detects common objects in video frames.
- Distance Estimation: Estimates distances of detected objects using bounding box sizes and reference heights.
- MediaPipe Pose Tracking: Identifies human poses and estimates distances based on shoulder width.
- Asynchronous Socket Communication: Uses
socketiofor real-time transmission of detected objects and their distances.
Technologies Used
- Python & Asyncio: For non-blocking, real-time processing.
- OpenCV: For image processing.
- YOLOv8 (Ultralytics): Object detection model.
- MediaPipe Pose: Human pose estimation.
- WebRTC (aiortc): Video streaming framework.
- Socket.io: Real-time client-server communication.
- Aiohttp: Asynchronous web framework.
How It Works
Client Connection:
- The client sends a WebRTC offer to establish a peer-to-peer connection.
- The server responds with an answer to complete the WebRTC handshake.
Video Processing:
- The
VideoTransformTrackclass processes incoming video frames. - YOLOv8 detects objects and calculates their distances using a predefined size reference.
- MediaPipe Pose detects human poses and estimates distances based on shoulder width.
- The
Real-Time Data Transmission:
- Detection results, including object labels, confidence scores, and estimated distances, are sent to the client via Socket.io.
- FPS (frames per second) is calculated and included in the response.
Connection Management:
- The server manages multiple peer connections and cleans up stale connections periodically.
Distance Estimation Approach
For objects: [ \text{distance} = \frac{\text{real height} \times \text{focal length}}{\text{bounding box height}} ]
- Uses a reference height table for common objects.
For humans: [ \text{distance} = \frac{\text{shoulder width (40 cm)} \times \text{focal length}}{\text{shoulder width (pixels)}} ]
- Adjusted with a calibration factor for accuracy.
Usage
- Install Dependencies: ```bash pip install opencv-python numpy aiortc socketio ultralytics mediapipe aiohttp
Log in or sign up for Devpost to join the conversation.