Sentinel

Inspiration

We love bare-metal embedded systems programming. Our team has shipped a DOOM port, hacked bootloaders, remote shells, logic analyzers, bare-metal GPU programs, and more — all starting from a blank SD card and no OS.

But this domain is notoriously difficult for AI. Embedded programming means scouring datasheets for device-specific register addresses instead of applying universal principles. Feedback is nearly nonexistent: no print statements, no debugger, nothing you didn't write yourself. And unlike software, hardware isn't contained within a virtual environment: an agent can't power-cycle a hung chip or probe a GPIO rail with an oscilloscope. All this, combined with some of the most confusing bugs and specific code of any programming domain, makes embedded programming an extremely difficult domain for AI.

We built Sentinel to change that. It's a full end-to-end agent system that researches datasheets and forums, spawns parallel worker agents on a custom Raspberry Pi server with an integrated FPGA logic analyzer for nanosecond-precision non-intrusive hardware tracing, and autonomously documents its work and opens a PR.

Then we used it to prove the point: we took a months-long project — a full OS on the Raspberry Pi Zero W — and had Sentinel write 16,000 lines of bare-metal embedded code in 24 hours, producing Sentinel OS.

What It Does

The Agent System

We use Claude Agent SDK to launch an agent network that responds to user feature requests. A user submits a GitHub repo URL and a natural language task (e.g., "Implement an SD card driver with FAT32 filesystem support"). The system creates:

Orchestrator Agent forks the repo, deep-reads the entire codebase to understand the existing architecture, documentation, conventions, and APIs
Research Agent searches the web for ARM datasheets, BCM2835 register maps, and protocol specifications
Orchestrator decomposes the task into parallel work items with step-by-step implementation instructions
Worker Agents execute in parallel — each writes code, edits files, and builds with arm-none-eabi-gcc toolchain
Hardware Testing — Workers submit compiled binaries to a run queue on the Pi Server, which flashes binaries onto real Raspberry Pi devices over UART. An integrated FPGA logic analyzer captures timestamped GPIO traces directly from the running hardware, providing structured execution feedback even when there are no print statements, no OS, and no debugger
Integration — Orchestrator merges all changes, pushes to GitHub, and creates a pull request

The FPGA Debug Layer

The core challenge of AI-driven embedded development is feedback: bare-metal code has no print statements, no OS, and no debugger. We solved this with a custom FPGA logic analyzer built on a PYNQ-Z2 board, deeply integrated into the agent testing loop.

FPGA Fabric (Verilog)

Custom pintrace6 IP core — AXI-Lite slave with a 4096-record BRAM trace buffer, 2-FF metastability-safe input synchronizer, and configurable trigger logic (rising edge, falling edge, or any-change)
Strobe-based event capture — on each trigger, snapshots all 6 pin states plus a 25-bit timestamp (10ns resolution at 100MHz) into a single 32-bit packed record
Full AXI-Lite register file: CTRL, CFG, MAX_RECORDS, NUM_RECORDED, STATUS, LAST_PINS
PS-PL integration — our custom hardware uses AXI peripherals and leverages both the FPGA’s Block Ram and data fabric to effectively interface the custom hardware with the ARM Cores and broader Sentinel system
Multiple input modalities — as the system scales from simple GPIO monitoring to complex protocols like HDMI and USB, the hardware architecture is designed to stay modular and versatile, enabling readout of arbitrary signal types and peripheral emulation without redesigning the core capture pipeline

Host Driver (Python on ARM Linux)

Raw /dev/mem mmap driver — direct register and BRAM access from userspace, no middleware
Built-in HTTP server exposing the full capture API as REST endpoints (GET /status, /trace, /pins; POST /capture, /arm, /stop, /clear) so any agent worker can query it over the network

How agents use it: after a binary is flashed and the Pi boots, the agent queries the HTTP API to get a structured, timestamped event timeline. It diffs the expected vs. actual event sequence to diagnose hangs (missing events), race conditions (timestamp deltas), and incorrect state transitions — all without touching the target device.

The Frontend

A real-time web dashboard where you can:

Submit jobs (repo URL + task description)
Watch live streaming logs as agents work (orchestrator reasoning, file edits, bash commands, Pi test results)
Monitor the Pi device cluster (which Pis are connected, which are busy, run history)

Sentinel OS

To demonstrate just how useful Sentinel is for embedded development, we undertook an incredibly ambitious project: creating a full operating system from the ground up starting from a blank SD card and a blank repo. We ended up using Sentinel to automatically generate 16000 lines of dense, embedded code, implementing:

GPIO driver for initial testing and debugging, as well as infrastructure for future protocols (read/write, function select for all BCM2835 pins)
miniUART and PL011 UART for communication between computer and pi (115200 baud, 8N1)
Bootloader to load and run arbitrary program binaries on the pi (both pi side kernel image and unix side install script)
System timer (BCM2835 1MHz free-running counter, microsecond-precision delays)
Watchdog for auto reboots on program exit.
SD card / EMMC driver (SDHC/SDSC, 4-bit bus, 25MHz, PIO read/write)
FAT32 filesystem (directory listing, file read, file write/create, long filename support)
MMU (virtual memory with section-based identity mapping, caching, domain access control)
Interrupts (SWI, timer IRQ, BCM2835 interrupt controller, vector table)
ARM CP14 debug (hardware breakpoints, watchpoints, single-step execution)
Bluetooth driver to send arbitrary packets over the air (specific bluetooth chip on board)
Full bluetooth packet setup, auth, and encryption stack to communicate directly with Logitech K650 keyboard
Mailbox driver for getting system info (temperature, model number, etc)
HDMI driver for display integration
Threads and context switching
User processes with syscalls to access restricted devices and peripherals
Full shell with file system navigation and the ability to run arbitrary program binaries
Visualization for shell over the HDMI driver

All features tested on physical hardware — 40+ test programs validated on real Pi Zero W devices.

The final demo is a Raspberry Pi brought up from bare-metal which can be plugged into a display and power and work as an OS with a bluetooth keyboard.

How We Built It

We started off work with pure Claude Code to see how it performed in building an OS, seeing where it was successful and what its limitations were. From there, we co-developed our Sentinel tool and Sentinel OS, identifying shortcomings and creating solutions through rigorous testing on real systems development. By the end, we had not only a robust and highly effective embedded development (Sentinel), but also a full working operating system that we created from scratch in just 24 hours.

Here are some of the highlights of our systems:

Agent System (Python)

Claude Agent SDK for orchestrating Claude Opus subprocesses with tool access (Read, Write, Edit, Bash, Glob, Grep, WebSearch)
FastAPI backend with Server-Sent Events (SSE) for real-time event streaming
Fetch.ai to expand Sentinel to the Agentverse
Hierarchical agent architecture: Orchestrator reads the codebase and writes detailed instructions; Workers follow instructions and test on hardware; Research Agent handles external documentation lookups
Structured event logging (15+ event types) persisted to JSONL for replay and debugging

Pi Server (Python)

FastAPI server that auto-detects Raspberry Pi devices connected via USB serial
Async worker pool: one worker per Pi, shared job queue, automatic device hot-plug detection
Binary flashing via ai-install bootloader tool over UART with CRC32 verification
Run management: submit, queue, execute, capture stdout/stderr, return results

FPGA Debug Layer (Verilog + Python)

Custom pintrace6 IP core: 4096-record BRAM trace buffer, 10ns timestamp resolution, configurable edge triggers
REST API over /dev/mem mmap: agents query structured execution traces directly over the network, no on-device debugger required
Modular signal architecture: extensible from GPIO to complex protocols like HDMI and USB

Frontend (TypeScript)

Next.js 15 (App Router) with React and Tailwind CSS
SSE-based live streaming: events flow from agent → FastAPI → Next.js → browser in real time
Pi cluster visualization: device status icons, expandable run history, color-coded event log
Job persistence: React Context + localStorage for tab management across page reloads

Bare-Metal OS (C + ARM Assembly)

Target: Raspberry Pi Zero W (BCM2835 SoC, ARM1176JZF-S, ARMv6)
Toolchain: arm-none-eabi-gcc cross-compiler, custom linker script (code at 0x8000)
No stdlib, no OS: every driver written from scratch using raw memory-mapped I/O
Custom bootloader: UART-based binary loading protocol so we never have to swap SD cards
FPGA trace integration: bare-metal test programs emit structured strobe events at driver milestones; the FPGA captures these non-intrusively so agents get execution timelines without any on-device debugging software

Challenges We Ran Into

There were a number of challenges we had to deal with during this concurrent development. While we were actually surprised at how good Claude code was at creating working code in many circumstances, the nature of embedded systems made it very difficult at times. Many times there is a single sentence in the middle of a datasheet that has information that is completely imperative to a system.

ARM debug registers are brutal: Setting up CP14 monitor mode requires precise coprocessor register configuration, and a single wrong bit means a silent hang with no way to debug the debugger
PS-PL integration: the chipset contains an FPGA for fast data readout alongside two ARM Cores to interface with the rest of the system. Our custom hardware had to integrate into this broader environment (this made design very complex!)
Agent coordination on file edits: Parallel workers editing overlapping files caused merge conflicts; we solved this by having the orchestrator assign non-overlapping file ownership
UART bootloader timing: The CRC32 handshake between host and Pi is timing-sensitive; we moved to a table-free CRC implementation to avoid static data issues after the bootloader self-relocates in memory
Getting agents to actually test on hardware: Early versions would write code and commit without testing; we had to make hardware testing mandatory in the agent prompts and verify Pi output before allowing commits

Accomplishments That We're Proud Of

We are really proud that we were able to get an extremely effective system off the ground so quickly that we were able to co-develop a super cool project (Sentinel OS) alongside it, both helping us iterate in the development process as well as validating the usefulness of the tool. In the end we were able to write over 16,000 lines of embedded C code in just 24 hours, culminating in a project that would probably take weeks if not months for an experienced human programmer to create.

What We Learned

AI agents can write genuinely low-level code (memory-mapped I/O, interrupt handlers, MMU page tables) if given the right context and documentation
Hardware-in-the-loop validation is critical — agents make subtle register configuration mistakes that only show up on real hardware
The orchestrator reading the full codebase, or specifically written notes describing it (something that our system produces), before planning is essential; delegating codebase understanding to workers led to infinite loops and wrong API usage
Structured event logging is invaluable for debugging and interpreting multi-agent systems — without it, understanding why an agent made a decision is nearly impossible