Ember

sim
web server
comms waveform

Inspiration

Fire-fighting is exactly the kind of dangerous, time-critical job we'd want a robot to take on — but a humanoid that can walk into a hazard is only useful if it can perceive that hazard cheaply and continuously. We kept hitting the same tension: every always-on sensor and computation on a robot competes for power, money, and cooling. Running a full neural network around the clock just to watch for fire is expensive overkill. So Ember became two halves solving one problem: a fire-fighting humanoid trained in simulation, and a custom FPGA perception accelerator that does the constant, low-power watching — a cheap always-on "first line of detection" that only wakes heavier compute when there's a real reason to.

What it does

Ember is a fire-fighting humanoid with a hardware-accelerated fire-detection front end.

The perception layer runs on an FPGA. As camera pixels stream in, it flags fire-colored pixels, filters out false positives by checking whether each flagged pixel is surrounded by other fire pixels (a real fire is a solid region, not scattered specks), locates the fire, and targets the base of the flame — where it would actually be fought. It outputs just a coordinate and a fire/no-fire flag per frame, not a whole image, over UART.

The humanoid layer is a Unitree humanoid trained in MuJoCo. Locomotion and approach behaviors are driven by reinforcement-learned policies (PPO), with A* for higher-level path planning toward the detected fire. The detected fire location from the FPGA is the target the humanoid stack acts on.

A home-base server ties it together, reading the FPGA's output over WiFi and visualizing detections live — and keeping the architecture open for multi-robot coordination and on-demand heavier AI.

How we built it

Perception (FPGA, Verilog): a streaming pipeline on a Xilinx Zynq — raster scanner → YCbCr color threshold → morphological erosion using two line buffers and a 3×3 sliding window (so all nine neighbors are available in a single clock cycle) → an accumulator that computes the centroid and flame-base aim point → a UART transmitter sending a compact binary packet per frame. We kept division off the fabric by deferring it downstream, and verified the hardware against a pixel-for-pixel Python golden model before programming the board.

Humanoid (MuJoCo + PPO + A*): we set up the Unitree humanoid in MuJoCo and trained locomotion/approach policies with PPO, using A* for planning toward a target. Getting stable, useful behavior took real fine-tuning of the training policies — reward shaping, tuning to keep the humanoid balanced while moving toward a goal rather than collapsing or learning degenerate gaits, and adjusting so the learned policy responded sensibly to an externally supplied target coordinate.

Software glue (Python, Flask): an image-to-memory converter, a golden-reference verifier, a live serial reader, a Flask web dashboard (phone-viewable over WiFi), and a video annotator that runs the exact FPGA algorithm frame-by-frame on real footage. We used Cursor and Claude through the build.

Challenges we ran into

The hardest problems were at the integration seams between two very different systems:

Bridging perception to the policy. The FPGA emits a raw coordinate over a serial link; the humanoid policy expects a target in its own frame. Getting that hand-off — serial packet → server → a target the trained policy could actually act on — was a real integration effort, and we ran out of time to fully close the loop into the live sim, so we route through the home-base server as the connecting layer.
Tuning the humanoid policies. PPO didn't just work out of the box — balancing locomotion stability against goal-seeking took repeated reward and hyperparameter tuning, and the policy had to stay robust when handed a target it hadn't seen during training.
FPGA pipeline alignment. BRAM read latency and the line-buffer window each add delay, so coordinates and frame-boundary signals needed careful re-alignment — we chased a one-pixel coordinate bias down to the exact register stage.
Morphology tradeoffs. Erosion removed noise but over-shrank thin fires; we attempted morphological opening, hit a dilation bug, and made the call to ship reliable erosion-only rather than risk a working demo.
Bring-up gremlins. A camera/ESP8266 path we ultimately scoped out, stale bitstreams, sim runtimes too short for UART, cached dashboards — the classic "sim is right but the board isn't" debugging across both hardware and the sim toolchain.

Accomplishments that we're proud of

A complete fire-detection pipeline running on real silicon, verified end-to-end: image → color detection → real-time neighborhood filtering → localization → UART → live dashboard.
Real-time morphological filtering with line buffers — the piece that genuinely justifies "why an FPGA," doing a neighborhood operation at one result per clock that a CPU can't sustain at frame rate.
A trained humanoid that learned to locomote and move toward a goal in MuJoCo, with policies tuned to stay stable.
Bringing two hard, separate systems — custom hardware perception and a learned humanoid controller — into one coherent fire-fighting story.
Knowing when to scope down to protect a working demo.

What we learned

Why FPGAs win for streaming, per-pixel work: dedicated hardware per stage, neighbors on wires instead of fetched from memory, deterministic latency with no cache jitter — and that much of FPGA design is timing alignment, not the logic itself.
How brittle RL policies can be, and how much reward shaping and tuning it takes to get stable, goal-directed humanoid behavior in MuJoCo.
That the real work in a multi-part robot is the integration between subsystems, not just each subsystem alone.
The value of a software golden model for trusting hardware output.
The strongest framing isn't "FPGA beats GPU" — it's a tiered system where a cheap, always-on FPGA gate guards expensive compute that runs only on demand.

What's next for Ember

Close the loop fully: FPGA detection → policy target → humanoid response, live and end-to-end.
Live camera input via a parallel camera module straight into the FPGA fabric.
Richer perception on the same pipeline: morphological opening to preserve thin fires, Sobel texture analysis to reject flat orange surfaces like sunsets, and temporal flicker detection (fire pulses at a few hertz; steady light doesn't) — each drops into the existing line-buffer foundation, as does thermal imaging.
More robust policies: further PPO tuning, domain randomization for sim-to-real, and training the humanoid on actual suppression behaviors rather than just approach.
The home-base server as an opening for multi-robot coordination and an on-demand heavier confirmer (e.g. a YOLO-style model) that the FPGA gate wakes only when it flags something.

Built With

a*
claude
cursor
flask
fpga
humanoid
mujoco
ppo
python
unitree
verilog

Updates

Parasmai Conjeevaram started this project — Jun 21, 2026 01:53 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.