GPT Rover - Wuff

Top view
Hardwiring
UI-Bottom
UI-Top

Inspiration

We were inspired by Mars and Moon Rovers. What if you could not visit some place personally, what if this place is dark or dangerous?

What it does

GPT Rover is our analogue of Perseverance or any other space exploration vehicle but adapted for Earth. It could explore your basement or living room (remotely), beep if necessary, and tell you what surroundings GPT Rover met. Since whole GPT-OSS model runs locally and controlled via WIFI/radiowaves - GPT Rover could run even on different planet.

How we built it

So basically, we built OpenWuff by connecting three main parts: your phone as a wireless camera, a computer doing the backend and AI thinking (because we connect to our gpt-oss model), and a small robot car. We created a web app that turns any phone into a streaming camera, then we built Python software on a computer that uses AI vision (YOLOv8) to detect objects and a language model (GPT-OSS-20B) to chat about what it sees. Finally, we programmed an ESP32s microcontroller to control motors and a buzzer on the physical rover.

In conclusion, the phone sends video to the computer, the computer analyzes it with AI and sends movement commands to the rover, creating an intelligent exploration robot that can see, think, move, and chat with you about what it discovers.

Challenges we ran into

Hardware constraints: Running a large model like GPT-OSS-20B on local hardware required careful optimization of GPU memory and parallelization.
Camera: choosing a board with integrated camera (esp32-cam) - we faced issues with the camera resolution that's what we decided to move to a PWA app to get video from a phone.
Object Detection: we decided to try it out Yolo8n and Yolov8s due to the low resources it consumes but at the cost of accuracy. We would love to improve this area
Function Calling: in our server we didn't set up this feature, so we decided to move with json extraction from the GPT chat we got
Team coordination: With different backgrounds and multinational, aligning hardware, software, and presentation and description parts took significant effort.

Accomplishments that we're proud of

Successfully integrated YOLO vision with GPT-OSS inference to create a working and open-source prototype of GPT Rover.
Reduced latency by multi-threading inference and optimizing GPU utilization.
Gave a second life and even brains!! to old hardware (camera lenses, RC car), showing how creativity and resourcefulness can build a working rover.

What we learned

How to deploy LLMs locally and optimize them for hardware and real-time applications.
Practical experience in combining vision models with language models, including synchronization of outputs.
Importance of team collaboration (from 3 different countries!!), balancing innovation with organization and presentation.
That even with limited resources, it is possible to create something that feels like a small step toward real AI-powered exploration vehicles.

What's next for GPT Rover - Wuff

Voice interaction: Adding speech-to-text and text-to-speech so GPT-Rover can “talk back.”
Autonomous navigation: Integrating reinforcement learning so the rover can plan its movements instead of only remote control.
Extended range: Exploring control via satellite or mesh networks, so GPT-Rover can be deployed in hard-to-reach places.
Companion mode: Transforming GPT-Rover into not only an explorer but also a “pet-like” companion (hence the “Wuff”) with playful behaviors.
Modular design: Allowing other sensors (temperature, gas detection, night vision) to be plugged in for specialized missions

Acknowledgement

Carlos had necessary equipment (a car kit, a phone, and an ESP32Sl), so Carlos connected the hardware part and interconnected the main system.
Victor had sufficient and strong computer with 12 GB VRAM, so he deployed GPT-OSS-20B locally, significantly improving inference quality (reduce latency, inference was multi-threaded, effective utilisation of computing power).
Arnaud helped us to get the endpoint from llama server and designing the slides.
Bakyt organised brainstorms, helped with organisational part and submission of project, presentation.

Built With

api
flask
gpt-oss
socket.io
yolo

Submitted to

OpenAI Open Model Hackathon

Created by

I spent time interconnecting the system of Wuff. Starting from the esp32s to the main server with the UI interface.

Carlos L
Hey, there!
Presentation part
Organizing calls

Bakyt Naurzalinov
I worked on the backend, focusing mainly on the server-side inference pipeline.
My responsibilities included:
-Reducing latency and ensuring end-to-end inference efficiency.
-Managing GPU memory loading for optimal utilization.
-Implementing multi-threaded parallel inference, making it both efficient and scalable.
-Compiling source code specifically for the target NVIDIA GPU hardware.
-Ensuring all CPU threads were utilized effectively during inference.

As a result, we managed to fully leverage almost all 20 CPU threads of the i5 processor, and maximize GPU compute utilization across ~55 out of 60 layers.

Victor Pimshin
Helped brainstorm and shape the project idea.
Contributed to key technical decisions, such as using YOLO for perception and GPT-OSS for reasoning.
Designed and prepared the project presentation to clearly communicate our work.

Arnaud Durand