Butter - Our Community Assisted Trash Bot

Inspiration

Walk across any college campus on a Monday morning and you'll find yesterday's bottles, cans, and wrappers trailing the sidewalks. Rutgers is no exception. Custodial staff can't be everywhere, and the students who do care about a cleaner campus don't always have a trash can nearby when they see litter.

We wanted a solution that turned "I wish someone would pick that up" into one tap. Instead of asking humans to carry trash, we asked: what if you could just take a photo, and a robot would show up and get it?

What it does

Campus Cleanup Router is an outdoor autonomous robot that patrols campus and picks up reported litter. The workflow is dead simple:

You spot trash. Open our companion app, tap the Reporter tab, snap a photo.
The app tags it with GPS and uploads it to our backend.
The robot gets dispatched. It pulls up walking directions from Apple Maps and drives itself to the location.
It autonomously finds the exact piece of trash using the photo you took as a visual reference, even among leaves, rocks, and other clutter.
It scoops it up using our intake system and driving forward into the item.
It verifies the pickup and reports back. Ready for the next job.

From a user's perspective, you take a picture of a bottle, and a few minutes later, the bottle is gone.

How we built it

The robot is a split-brain design. Heavy AI compute lives on a remote desktop with an RTX 4080 GPU; the robot itself carries a Raspberry Pi that just drives motors and streams webcam video. They talk over WiFi.

The mobile app (Expo / React Native) has two tabs — one for reporters submitting trash photos, and one that runs on the robot's mounted iPhone to handle GPS and street-level navigation via Apple Maps.

The backend (Node / Express) is a lightweight relay. It stores reports, hands out the next job, and proxies Apple Maps for walking routes.

The brain desktop runs three AI models:

A custom-trained YOLO detector (trained on the TACO and Drink Waste datasets) to spot trash and obstacles like people and bikes A vision-language model (Qwen3-VL) as a high-level scout when the robot loses sight of the target The robot hardware is a cardboard chassis driven by two NeveRest gear motors through an L298N H-motor driver, powered by a Ryobi 18V battery pack. A Logitech C270 webcam feeds the brain over MJPEG. The whole thing is coordinated by a state machine that smoothly hands control between the phone (for street navigation) and the brain (for the final visual approach).

Challenges we ran into

Building a hybrid ML system that could actually make decisions on its own. A single model was never going to work. We needed the robot to constantly see the world (fast), understand what it's looking at (smart), and decide what to do next (responsive) — all at the same time. No off-the-shelf model does all three well.
GPS alone can't find a bottle. GPS accuracy drops to 10–30 meters near buildings — fine for "drive to this block," useless for "pick up that bottle." We let the phone handle coarse street-level navigation with Apple Maps, then hand off to the brain's vision system for the final approach. GPS gets the robot close; the camera takes it the rest of the way.
Powering a computer and two motors off one battery, fast. The electrical system had to be compact, robust, and share a single Ryobi 18V pack between the Pi and the drive motors without browning out when the motors spiked current. Designing, fabricating, and debugging it in a hackathon timeline — with limited tools and no second chances — was its own engineering problem on top of the software

Accomplishments that we're proud of

We built a working full-stack product in 24 hours: mobile app, backend, computer vision pipeline, state machine, and hardware robot.
The state machine actually works. Watching the robot smoothly hand off control from the phone's GPS nav to the brain's visual approach is a cool moment. Two independent systems coordinating without a hitch.
Getting the models to actually talk to each other. YOLO speaks bounding boxes, Qwen3-VL speaks English, and the control loop speaks motor PWM. We built a translation layer so YOLO's detections become Qwen3-VL's questions, and its prose answers become signals the state machine can act on.

What we learned

Small specialized models + one big generalist is a killer combo. Our tiny custom YOLO handles the high-frequency work; Qwen3-VL steps in for the hard judgment calls. Neither model alone could run this robot, but together they cover each other's weaknesses — fast where it needs to be, smart where it matters.
Hardware and software timelines never agree. Mechanical fabrication, wiring, and software all had different bottlenecks. Parallelizing was essential.
Distributed systems are hard, even for two devices. Most of our bugs weren't in any single component. They were in the spaces between them. We got religious about logging every state transition.

What's next for Butter - Our Community Assisted Trash Bot

More field testing on campus. We want to run Butters on real Rutgers sidewalks across different weather, lighting, and foot traffic to find where it struggles.
A bigger, cleaner custom dataset. More labeled photos from around campus will make the detector more reliable across the full range of litter Butters actually encounters.
Making Butters metal. The cardboard chassis got us through the hackathon, but a metal frame is the next step for durability, weatherproofing, and actually holding up to daily outdoor use.