Inspiration

I've raced AWS DeepRacer on world circuits. I've gone deep into reinforcement learning — reward functions, hyperparameter tuning, simulation-to-real transfer — pushing models to shave milliseconds off lap times. I love this machine.

But every deployment hit the same ceiling. The car was brilliant at the one thing it was trained for. Ask it to respond to a verbal instruction, handle an unexpected obstacle, or change behaviour mid-run — and it was helpless. Intelligence baked in at training time, frozen forever. Every robot in history faces this wall: someone hardcodes the behaviours, and the machine executes exactly what it was told. Nothing more.

I wanted to break that entirely.

Strands Agents SDK — and specifically strands-labs/robots, released just three weeks ago — handed me a blueprint I hadn't seen before. A physical robot treated as just another tool in an agent loop: callable, stoppable, observable like any API without touching the execution layer. The first robotics architecture that felt like software engineering rather than robotics engineering.

Then Amazon Nova 2 changed what was possible. Nova 2 Lite could decompose a natural language instruction into timed movements that actually work on a physical car. Nova 2 Pro could look at a live camera frame and make a genuine decision — not just classify what it saw, but reason about what the car should do next. Continue. Replan. Abort. That shift — from classification to decision — is what made the vision loop real.

Three things came together: an agent framework that treated robots as tools, a machine I knew intimately from years of competitive racing, and a multimodal reasoning model that could see and act. strands-labs/robots showed me how the pieces connect. Nova 2 Lite replaced trained planning with language. Nova 2 Pro replaced fixed policy with real-time visual reasoning.

One question crystallised: what happens when a car stops following scripts and starts following intent?

We built the answer.

What It Does

Agentic AI DeepRacer is a closed-loop autonomous navigation system for AWS DeepRacer, powered by the Strands Agents SDK and Amazon Nova 2. Type a sentence — "slalom through 3 cones" or "drive forward and stop when you see red" — and the car plans, moves, and adapts in real time. No scripts. No waypoints. No manual control. Just language in, autonomous driving out.


Natural Language Planning

When you submit a prompt, a Strands Agent powered by Amazon Nova 2 Lite takes over. It doesn't guess — it reasons. A physics-aware system prompt encodes the car's exact calibration constants - turning radius, corner speed limits, arc radius constraints, and stabilisation rules. The model works through an 8-point chain of thought — identifying the pattern, walking every step tracking heading in degrees, computing turn durations arithmetically, verifying total rotation, checking physics limits, confirming stabilisation steps, counting steps, and validating safety — before committing to a single movement.

The output is a validated JSON plan drawn from a library of 15 verified navigation patterns: circle, figure-8, square, triangle, pentagon, hexagon, oval, slalom, chicane, lane-change, spiral-out, zigzag, parallel-park, figure-forward, and U-turn. Each pattern carries a rotation proof. A runtime validator checks the plan independently and surfaces mismatches as warnings in the dashboard before you execute.

When the instruction mentions reacting to something visible — "stop when you see an obstacle", "halt if you see red" — the planner automatically splits long forward steps into short chunks of one second or less, giving the vision model a check point every 0.4 metres of travel rather than every full step.


Execution Engine

The plan is handed to a DeepRacerTool(AgentTool) — a Strands AgentTool modelled directly on the strands-labs/robots architecture. It manages the full asynchronous execution lifecycle through four actions: execute (blocking), start (non-blocking), status (poll), and stop (abort). A TaskStatus state machine tracks every transition — idle, connecting, planning, running, completed, stopped, error — and a single-worker thread executor ensures only one plan runs at a time.

Every step dispatches to the DeepRacer's HTTP control API via aws-deepracer-control-v2: forward, backward, left turn, right turn, or stop. If any step fails, an emergency stop is sent to the hardware immediately before the next step begins. The execution result for every step — including the raw motor command, duration, and response — streams to the browser in real time via Server-Sent Events.


Closed-Loop Vision

Between every movement step, Amazon Nova 2 Pro receives the latest JPEG frame from the car's front USB camera alongside three pieces of context: the original instruction, the current step index and action, and how many times the plan has already been revised. It returns a structured JSON decision with four fields: action, reasoning, confidence, and a revised instruction if replanning is warranted.

The decision maps directly to the original instruction. If the instruction says "stop when" or "halt if", an obstacle in the frame triggers abort and the car stops immediately. If the instruction says "avoid" or "go around", Nova 2 Pro returns replan with a new instruction, the Strands Agent re-plans from that instruction, and execution continues with the new steps. If the instruction makes no mention of obstacles, Nova 2 Pro defaults to continue unless a collision is genuinely imminent. This instruction-driven decision mapping means the same vision model behaves differently depending on what you asked for — without any code changes. Every Nova 2 Pro call is wrapped in a four-second timeout — if the cloud call takes too long, the step proceeds as continue and the car never stalls waiting for a response.


Live Dashboard

A Flask web dashboard brings everything together. The left panel shows physics limits and quick prompts. The centre panel shows the generated plan — action pills, durations, heading change annotations — and live execution results streaming in as each step completes. A dedicated camera panel shows the live JPEG feed polled at two frames per second. The vision log beside it records every Nova 2 Pro decision as it fires: the step number, the action taken, the confidence score, and Nova 2 Pro's own reasoning in plain English. Replan events appear as banners showing the revised instruction and new step count. The entire session — plan, execution, vision decisions, replans — is visible in real time without a single page refresh.

How We Built It

Building Agentic AI DeepRacer was not a single implementation — it was three progressive phases, each one unlocking the next. Every architectural decision traces back to one source of truth: strands-labs/robots.

phase


The Foundation: strands-labs/robots as the Blueprint

Before writing a single line of code, we mapped every concept from strands-labs/robots directly to the DeepRacer. The framework defines a Robot(AgentTool) with four lifecycle actions, a TaskStatus state machine, a single-worker executor, a threading shutdown signal, and a Policy abstraction that decouples planning from execution. Every one of these mapped cleanly.

Robot became DeepRacerTool(AgentTool). The four actions — execute, start, status, stop — stayed identical. The TaskStatus enum gained one extra state: PLANNING, inserted between CONNECTING and RUNNING to distinguish "waiting for Bedrock" from "sending motor commands". The ThreadPoolExecutor(max_workers=1) and threading.Event shutdown pattern were copied verbatim. The Policy abstraction became NavigationPolicy with three implementations: NovaPolicy (live Amazon Nova 2 Lite via Bedrock), MockPolicy (offline testing with no hardware), and ReplayPolicy (saved named manoeuvres).

The strands-robots observation loop — get_observation() → get_actions() → send_action() — became the seam where Phase 3 slots in: get_latest_frame() → assess_step() → execute_step().


Phase 1: Proving the Concept

The first phase was a working proof of concept. A Strands Agent called Amazon Nova 2 Lite, received a JSON plan, confirmed it with the operator, and executed it using bare @tool functions wrapping the DeepRacer's HTTP API via aws-deepracer-control-v2. It worked for simple prompts but had real limitations — the model guessed turn durations, had no pattern vocabulary, and continued running after a failed step. phase1

phase1demo

Phase 2: The AgentTool Architecture

Phase 2 was a ground-up redesign. The core insight was that navigation planning is a physics problem, not just a language problem. We empirically measured the DeepRacer's motion characteristics on the physical car: STEER_ANGLE=0.50 at TURN_THROTTLE=0.20 produces approximately 90 degrees of heading change in 1.5 seconds. That single calibration constant — DEGREES_PER_SECOND = 60 °/s — became load-bearing infrastructure that every pattern in the system depends on.

The planner system prompt was rebuilt around an 8-point mandatory chain-of-thought: identify the pattern, walk every step tracking heading, compute turn durations using the calibration formula, verify total rotation equals 360 degrees, check physics limits, confirm stabilisation steps between direction reversals, count total steps, validate safety caps. This forced the model to do the rotation arithmetic explicitly before committing to any step — and it caught real bugs, including a circle pattern that would have spun the car 720 degrees instead of 360.

We built a library of 15 verified navigation patterns, each with a rotation proof embedded in the prompt. A Python _check_rotation() validator replicates the verification at runtime and surfaces mismatches as warnings in the dashboard before execution begins.

The Flask dashboard was built with a fully live SSE pipeline — no polling, no page refresh. Each step result, vision event, and replan fires as a Server-Sent Event the instant it completes on the hardware, streamed from a thread-safe queue through Flask's Response(_generate()) to the browser's EventSource.

phase2

phase2demo


Phase 3: Closed-Loop Vision

Phase 3 added the eye. Three new components built on top of the Phase 2 engine without changing it.

camera_stream.py runs a daemon thread that consumes the DeepRacer's MJPEG HTTP stream.

vision_assessor.py wraps a single client.converse() call to "Amazon Nova 2 Pro". The boto3 Converse API accepts image input as raw bytes in source["bytes"] — no base64 encoding required. The system prompt is instruction-aware: it reads the original navigation instruction and maps it to the correct action. Instructions containing "stop when" or "halt if" map obstacle detection to abort. Instructions containing "avoid" or "go around" map to replan with a new instruction. Instructions that make no mention of obstacles default to continue. Every call runs inside asyncio.wait_for with a four-second timeout so the car never stalls waiting for the cloud.

camera_policy.py is the orchestrator. It implements NavigationPolicy — the same interface as NovaPolicy and MockPolicy — so it slots into the execution engine with zero changes to the downstream code. The has_vision = True property signals DeepRacerTool to activate the vision gate between steps. plan() delegates to the inner NovaPolicy. assess_step() delegates to VisionAssessor.

The execution loop in deepracer_agent_tool.py gained a vision gate that runs before each step: retrieve the latest frame, call assess_step(), act on the decision — continue executes the step, replan replaces the remaining steps with a new plan, abort calls deepracer_stop() immediately. The pre-approved plan path _execute_approved_plan(plan) was added separately from _execute_task_async(instruction) so the web UI could pass the already-confirmed plan directly without triggering a second LLM call.

phase3

phase3


Hardware Considerations

The entire system runs on a MacBook communicating with the DeepRacer over WiFi. The DeepRacer's onboard compute module handles ROS2, the web console, and the camera node. Our code runs entirely on the laptop — planning, vision assessment, dashboard — with only the final motor commands crossing the network as HTTP requests to the DeepRacer's web API. This kept latency predictable and the architecture simple: one HTTP call per step, one MJPEG stream per run.


Making It Hardware-Agnostic

Once the three-layer architecture was stable, we abstracted the tool layer entirely. A base_tools.py protocol defines the nine-function interface every platform must implement: connect, move_forward, move_backward, turn_left, turn_right, stop, is_error, reset_client, activate_camera — plus four physics constants. A USE_CASE environment variable loads the correct tool module at startup. The result is 13 use case files** — **warehouse AMR, drone, robot arm, underwater ROV, solar inspection robot, and more — that run the same Nova 2 planner, Nova 2 Pro vision loop, and Strands execution engine on entirely different hardware by changing one line in .env.

Challenges We Ran Into

Some of the hardest problems had nothing to do with AI — they were about getting the hardware alive in the first place.


The Hardware Gauntlet

Before writing a single line of Phase 1 code, the DeepRacer itself fought back.

The battery dropped below terminal voltage mid-session — not a graceful shutdown, just sudden silence. The car wouldn't boot at all. We had to jump the battery using an external power source to bring it back above the minimum threshold before the system would respond. From that point, every session started with a voltage check and long tests were broken into short runs.

Then the onboard OS corrupted. The DeepRacer's Ubuntu 20.04 + ROS2 Foxy stack refused to boot cleanly — services were half-starting, the web console was unreachable, and ROS2 nodes were throwing errors that had nothing to do with our code. We spent hours in SSH debugging systemd service states before accepting the inevitable: full OS reflash. That meant reconfiguring WiFi, resetting credentials, and re-running every calibration measurement from scratch.

Even after the reflash, deepracer-core — the systemd service managing the camera node, servo controller, and web console — needed a reliable startup sequence before any code could run. The fix was to wait for the web console to respond, authenticate, then explicitly activate each service before touching them programmatically. Skipping any step produced silent failures that looked like software bugs but were actually service ordering issues in the ROS2 graph.


Phase 2: Making the Planner Physics-Aware

The first version of the planner produced plans that looked reasonable but were physically wrong. The model would plan a circle using eight left turns of 1.5 seconds each — which sounds plausible until you run the arithmetic: eight turns at 90 degrees each is 720 degrees, not 360. The car would spin twice and stop, having drawn two overlapping circles.

The fix required treating the system prompt as engineering. We physically measured the car's rotation rate at the operating throttle and steering angle — STEER_ANGLE=0.50 at TURN_THROTTLE=0.20 produces approximately 90 degrees of heading change in 1.5 seconds. That constant became DEGREES_PER_SECOND = 60 °/s and was encoded into both the system prompt and a Python runtime validator that re-runs the same check after planning and surfaces mismatches before execution. The 8-point chain-of-thought — including an explicit rotation verification step — is not a design choice. It is the accumulated result of every time the model got it wrong on the car floor.


Phase 3: The Camera Was Silent

The most confusing bug in the entire project had a one-line fix.

lsusb showed the camera connected. The MJPEG stream URL responded with HTTP 200. But the frame buffer thread received zero bytes — no JPEG markers, no frames, nothing. For hours we suspected the MJPEG parser, the byte scanning logic, the camera hardware itself. The root cause was the ROS2 topic. The correct topic is /camera_pkg/display_mjpeg — but it has no publisher until the node is activated via PUT /api/vehicle/media_state with {"activateVideo": 1}. Without that call, the MJPEG stream URL responds successfully but streams silence. The camera is present on the USB bus, registered in ROS2, and completely inert. Adding activate_camera() — called after authentication, before the stream thread starts — fixed it immediately. 1908 frames delivered in the first real run.


Phase 3: Vision Latency vs. Step Duration

Nova 2 Pro takes one to three seconds to assess a frame. A forward step of three seconds means the car has already travelled the full distance before a single vision check fires. For "drive forward and stop when you see red", a three-second step means up to 1.2 metres of overshoot past the obstacle before the car stops.

The solution was obstacle-aware step splitting in the planner. When the instruction contains stop-on-condition language, the planner automatically chunks forward steps into one-second segments. Three seconds becomes three steps of one second each. Vision fires between every chunk — every 0.4 metres — and can abort before the car reaches the obstacle. The maximum overshoot drops from 1.2 metres to 0.4 metres.

This required teaching the planner a new rule: obstacle-aware instructions and time-based instructions are fundamentally different planning problems. The planner's response to "move forward 3 seconds" should be one step. The planner's response to "move forward 3 seconds, stop on obstacle" should be three steps. Getting the model to make that distinction reliably — and not over-apply it to every instruction — was its own prompt engineering challenge.

Accomplishments That We're Proud of

A physical car that genuinely reasons. Watching the car execute "slalom through 3 cones, stop if you see red" — navigating the weave correctly, then halting mid-pattern when Nova 2 Pro spotted a red object — is something that feels fundamentally different from scripted robotics. The decision to stop was not programmed. It was reasoned, in real time, from a camera frame.

True strands-labs/robots fidelity. Every concept from strands-labs/robots maps to a concrete implementation — not inspired by it, but faithfully applied to a different hardware platform and a different AI paradigm. Robot became DeepRacerTool. GR00TPolicy became NovaPolicy. The observation loop became get_latest_frame() → assess_step() → execute_step(). The architecture did not need to change. Just the intelligence inside it.

robots

Nova 2 Pro as a reasoning layer, not a classifier. Rather than training a vision model or fine-tuning for robotics, we used Nova 2 Pro as a meta-decision layer — continue, replan, or abort — while the Phase 2 pattern library handled actual movement. No robotics training data. No labelled obstacle datasets. A general multimodal reasoning model drives a physical car based on natural language intent.

Surviving the hardware. Getting a battery-jumped DeepRacer with a reflashed OS and a dormant ROS2 camera node to reliably run a closed-loop AI vision system in a live demo environment is an accomplishment in itself. Every failure made the system more robust.

A hardware-agnostic architecture. The 13 use case tool layer files — warehouse AMR, drone, robot arm, underwater ROV, solar inspection robot, and more — demonstrate that everything above the hardware interface is genuinely reusable. The same Nova 2 Lite planner, Nova 2 Pro vision loop, and Strands AgentTool execution engine on any platform by changing one environment variable.

Live demo at AWS Summit Bengaluru 2026. The system runs on real hardware, in front of a live audience, with audience members typing prompts. Every run is different. Every obstacle response is computed in real time by Nova 2 Pro. No two demos are the same.


What We Learned

strands-labs/robots solves real problems. The AgentTool pattern for physical actuators — four lifecycle actions, single-worker executor, threading shutdown signal — addresses exactly the concurrency and state management challenges that arise when running long-running async tasks inside an agent loop. Every design decision in the framework exists for a reason that only becomes obvious when you implement it on real hardware.

The system prompt is load-bearing infrastructure. The 8-point chain-of-thought in the planner prompt is not optional polish — it is what prevents 720-degree circle spins, missing stabilisation steps, and step count overflows. The instruction-driven decision mapping in the vision assessor prompt is what makes the difference between a vision model that replans on every shadow and one that stops exactly when asked to. Prompt engineering at this level is systems engineering.

Latency shapes architecture. Every design decision — between-steps-only vision, obstacle-aware step splitting, four-second timeouts, non-blocking frame buffers — exists because Nova 2 Pro takes one to three seconds and the car moves at 0.4 metres per second. You cannot design the AI loop without knowing the physics of the hardware it controls.

Nova 2 Pro understands context, not just images. The same model, given the same frame, returns different decisions depending on what the instruction says. That instruction-awareness — reasoning about what the human asked for, not just what the camera sees — is what makes the closed loop coherent rather than reactive.

Hardware integration is the unglamorous majority of the work. Battery management, OS recovery, ROS2 service ordering, camera node activation, MJPEG stream parsing — none of it appears in the architecture diagram, but all of it stands between the code and a working demo. Respecting the hardware is not optional.


What's Next for Agentic AI DeepRacer

Multi-turn conversation mid-execution. The system currently takes one instruction per run. The next step is a Strands agent that accepts follow-up prompts while the car is moving — "actually, avoid the left side" — and folds them into the active plan without stopping.

Verified hardware integrations. The 13 use case tool layer files cover warehouse AMRs, drones, robot arms, and more — but all are unverified on physical hardware. The immediate next step is running the common engine on at least one other real platform to prove the abstraction holds outside DeepRacer.

Temporal vision context. Nova 2 Pro currently receives a single frame. Passing a short sequence of recent frames would give it temporal context — detecting motion, tracking object trajectories across frames, and distinguishing a person crossing the path from a static obstacle placed there permanently.

Edge inference. Current Nova 2 Lite and Nova 2 Pro calls go to Bedrock over WiFi. A compelling next step is distilling a small navigation model that runs on the DeepRacer's onboard compute — keeping the Nova 2 Pro vision loop for complex decisions while handling simple patterns locally with near-zero latency.

Fleet coordination. Multiple DeepRacers, each as a DeepRacerTool in the same Strands agent, coordinating patterns — forming shapes, passing objects, avoiding each other — with a single natural language instruction to the fleet. One prompt. Many cars.

Racing with intent. I came to this from competitive DeepRacer racing. The next question is whether an agentic system — one that can respond to track conditions, adapt its line, and make decisions mid-lap — can outperform a fixed reinforcement learning policy on a real circuit. That experiment has not been run yet.

Built With

  • amazon-nova
  • deepracer
  • strands-agents-sdk
Share this project:

Updates