Inspiration

We were inspired by Mars and Moon Rovers. What if you could not visit some place personally, what if this place is dark or dangerous?

What it does

GPT Rover is our analogue of Perseverance or any other space exploration vehicle but adapted for Earth. It could explore your basement or living room (remotely), beep if necessary, and tell you what surroundings GPT Rover met. Since whole GPT-OSS model runs locally and controlled via WIFI/radiowaves - GPT Rover could run even on different planet.

How we built it

So basically, we built OpenWuff by connecting three main parts: your phone as a wireless camera, a computer doing the backend and AI thinking (because we connect to our gpt-oss model), and a small robot car. We created a web app that turns any phone into a streaming camera, then we built Python software on a computer that uses AI vision (YOLOv8) to detect objects and a language model (GPT-OSS-20B) to chat about what it sees. Finally, we programmed an ESP32s microcontroller to control motors and a buzzer on the physical rover.

In conclusion, the phone sends video to the computer, the computer analyzes it with AI and sends movement commands to the rover, creating an intelligent exploration robot that can see, think, move, and chat with you about what it discovers.

Challenges we ran into

  • Hardware constraints: Running a large model like GPT-OSS-20B on local hardware required careful optimization of GPU memory and parallelization.
  • Camera: choosing a board with integrated camera (esp32-cam) - we faced issues with the camera resolution that's what we decided to move to a PWA app to get video from a phone.
  • Object Detection: we decided to try it out Yolo8n and Yolov8s due to the low resources it consumes but at the cost of accuracy. We would love to improve this area
  • Function Calling: in our server we didn't set up this feature, so we decided to move with json extraction from the GPT chat we got
  • Team coordination: With different backgrounds and multinational, aligning hardware, software, and presentation and description parts took significant effort.

Accomplishments that we're proud of

  • Successfully integrated YOLO vision with GPT-OSS inference to create a working and open-source prototype of GPT Rover.
  • Reduced latency by multi-threading inference and optimizing GPU utilization.
  • Gave a second life and even brains!! to old hardware (camera lenses, RC car), showing how creativity and resourcefulness can build a working rover.

What we learned

  • How to deploy LLMs locally and optimize them for hardware and real-time applications.
  • Practical experience in combining vision models with language models, including synchronization of outputs.
  • Importance of team collaboration (from 3 different countries!!), balancing innovation with organization and presentation.
  • That even with limited resources, it is possible to create something that feels like a small step toward real AI-powered exploration vehicles.

What's next for GPT Rover - Wuff

  • Voice interaction: Adding speech-to-text and text-to-speech so GPT-Rover can “talk back.”
  • Autonomous navigation: Integrating reinforcement learning so the rover can plan its movements instead of only remote control.
  • Extended range: Exploring control via satellite or mesh networks, so GPT-Rover can be deployed in hard-to-reach places.
  • Companion mode: Transforming GPT-Rover into not only an explorer but also a “pet-like” companion (hence the “Wuff”) with playful behaviors.
  • Modular design: Allowing other sensors (temperature, gas detection, night vision) to be plugged in for specialized missions

Acknowledgement

  • Carlos had necessary equipment (a car kit, a phone, and an ESP32Sl), so Carlos connected the hardware part and interconnected the main system.
  • Victor had sufficient and strong computer with 12 GB VRAM, so he deployed GPT-OSS-20B locally, significantly improving inference quality (reduce latency, inference was multi-threaded, effective utilisation of computing power).
  • Arnaud helped us to get the endpoint from llama server and designing the slides.
  • Bakyt organised brainstorms, helped with organisational part and submission of project, presentation.

Built With

Share this project:

Updates