About the Project
Smart Trash Detector was inspired by the need for a practical tool to help people recycle correctly. This is my first time building a computer vision project where i want to see what i can build with my current knowledge and with the help of AI.
The project combines:
YOLO (You Only Look Once) – a state-of-the-art object detection model – to detect trash items from images or live webcam feed.
GPT-OSS (Open Source GPT) – a large language model – to generate concise, human-readable instructions for disposal, like whether an item should be recycled, composted, or thrown in the landfill.
The system works in two modes:
Upload Image – Users can upload a photo, and the model will annotate items and provide instructions.
Webcam Mode – Real-time detection, showing bounding boxes for trash items. Users can then summarize the detected items with GPT-OSS for guidance.
Challenges
Integrating YOLO and GPT-OSS into a single pipeline. YOLO is fast for detection, but calling GPT-OSS via the Hugging Face Inference API added latency.
Handling real-world variations in trash appearance. Lighting, angles, and occlusions affected detection accuracy.
Understanding how to use GPT-OSS correctly, since running it locally requires significant computational resources. We opted for hosted inference via Hugging Face and Together API.
What I Learned
Practical deployment of AI often requires combining multiple models (vision + language) and handling asynchronous or slow API calls.
Object detection models are highly sensitive to dataset quality and labeling. Fine-tuning YOLO improved accuracy on trash categories.
Prompt engineering for GPT-OSS is critical: concise and structured prompts yield better disposal instructions.
Built With
- gpt-oss
- gradio
- python
- yolo
Log in or sign up for Devpost to join the conversation.