About the Project

Smart Trash Detector was inspired by the need for a practical tool to help people recycle correctly. This is my first time building a computer vision project where i want to see what i can build with my current knowledge and with the help of AI.

The project combines:

YOLO (You Only Look Once) – a state-of-the-art object detection model – to detect trash items from images or live webcam feed.

GPT-OSS (Open Source GPT) – a large language model – to generate concise, human-readable instructions for disposal, like whether an item should be recycled, composted, or thrown in the landfill.

The system works in two modes:

Upload Image – Users can upload a photo, and the model will annotate items and provide instructions.

Webcam Mode – Real-time detection, showing bounding boxes for trash items. Users can then summarize the detected items with GPT-OSS for guidance.

Challenges

Integrating YOLO and GPT-OSS into a single pipeline. YOLO is fast for detection, but calling GPT-OSS via the Hugging Face Inference API added latency.

Handling real-world variations in trash appearance. Lighting, angles, and occlusions affected detection accuracy.

Understanding how to use GPT-OSS correctly, since running it locally requires significant computational resources. We opted for hosted inference via Hugging Face and Together API.

What I Learned

Practical deployment of AI often requires combining multiple models (vision + language) and handling asynchronous or slow API calls.

Object detection models are highly sensitive to dataset quality and labeling. Fine-tuning YOLO improved accuracy on trash categories.

Prompt engineering for GPT-OSS is critical: concise and structured prompts yield better disposal instructions.

Built With

Share this project:

Updates