Waymo Agent

Workflow flowchart

Developed by Team WeaveWay: Dani Thi Graviet, Roddsi Sarkar, Srinethe Sharavanan, Bek

TL;DR One driving scene → one reasoning agent → one critic → full explainability. Next, those same agents start talking to each other — learning collaboratively through shared language and feedback. That’s the future of self-improving, cooperative AI systems.

Inspiration

Autonomous vehicles today operate in isolation. Each system processes sensor data, predicts motion, and executes — but there’s no shared reasoning layer or self-critique between agents. We wanted to build a foundation where autonomous agents could not only reason about their environment, but also reflect, self-evaluate, and even communicate insights to one another.

The result is Waymo-Agent, a Weave-powered reasoning pipeline that turns scene Q&A data from the Waymo Open Motion Dataset into structured reasoning loops — complete with planning, critique, and score-based self-assessment. It’s a step toward cars that don’t just act — they think out loud and get better together

What it does

How we built it

Started with unstructured reasoning data from the Waymo Open Motion Dataset (WOMD).
Defined a QARecord schema to represent symbolic knowledge from each scene (environment, ego, and neighbor states).
Implemented MartianAgent (planner) and CriticAgent (reviewer) using the OpenAI API.
Built a structured pipeline in Python that runs these agents sequentially, records reasoning steps, and visualizes them through Weave.
Tested it locally via a single-record loop before scaling to full datasets.

Challenges we ran into

Adapting the Waymo dataset from motion vectors to symbolic Q&A format.
Keeping model outputs deterministic and schema-consistent for Weave.
Handling long-context reasoning while maintaining structured JSON compatibility.
Managing OpenAI rate limits during iterative experiments.

Accomplishments that we're proud of

A fully functional reasoning-and-critique loop that runs on real Waymo data.
End-to-end Weave visualization, letting us inspect every agent’s decision and score.
Modular architecture: planners, critics, and data handlers can be swapped independently.
Clear path toward adaptive and collaborative AI agents.

What we learned

Visibility beats complexity. Many “AI bugs” aren’t algorithmic — they’re about not seeing what the model was thinking.
Weave’s tracing made the reasoning pipeline interpretable for humans.
Lightweight contextual feedback can mimic early self-improvement even without full RL.
Defining strict data schemas helps language models behave like reliable subroutines.

What's next for Waymo Agent

Adaptive prompt memory: Use critic feedback to refine planner prompts dynamically across iterations.
Vehicle-to-Vehicle communication: Two cars exchange summarized scene context (“Crosswalk ahead — slowing down”) → parse → adjust motion → log → critique each other. This will test early cooperative reasoning between autonomous agents.
Scale to multi-agent simulations: Use this same reasoning loop for swarms of agents in shared environments.

Built With

openai-api
python
waymo
weave

Updates

Roddsi Sarkar posted an update — Oct 12, 2025 05:18 PM EDT

Huh! Sri managed to get the adaptationLayer up and running but none of our API keys are working :/ Historical logs to show for the time being!

Log in or sign up for Devpost to join the conversation.

Roddsi Sarkar posted an update — Oct 12, 2025 03:35 PM EDT

In the final stretch, Sri is helping us add a lightweight adaptation layer to make our agents contextually self-improving without any heavy retraining. This layer dynamically adjusts prompt weighting and memory based on critic feedback, allowing subtle behavioral adaptation between runs. It’s not full RL, but it adds a sense of self-awareness to the reasoning loop, where each iteration learns to refine its next response.

Log in or sign up for Devpost to join the conversation.

Roddsi Sarkar started this project — Oct 12, 2025 03:00 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.