RL Mars Rover Pathfinder Simulation on Perseverance data

Rover Gravity Simulation
visualization of the Jezero createrwith material terrain valley based on the flat 2d NASA data
True Shortest here is Pink but the RL trained teal shows a better optimized version that prevents high drops and damage-causing valleys etc

Inspiration

In December 2025, NASA's Perseverance rover completed the first AI-planned drive on Mars using HiRISE orbital imagery and terrain digital elevation models (DTMs). Before this, every drive waypoint was manually planned by human engineers: a process taking days. The data story: using the same HiRISE DTM dataset that JPL engineers use, we show how a trained RL rover agent navigates Jezero Crater in real time, how it compares to a human-planned route, and what terrain features it learns to avoid. The twist: the RL agent often finds shorter paths that humans wouldn't take because it learns subtle slope thresholds imperceptible in flat top-down images.

What it does

Visualize the full Jezero Crater DEM at 1–2m resolution!! and use a cpp simulation physics engine based on the mission and terrain data and find the potential Perseverance paths, overlay RL-planned paths, and quantify the energy savings of each potential pathfinding approaches.

Jezero Crater isn't flat! It's a topographically rich ancient river delta with ridges, boulder fields, and subtle slope variations that look benign on a 2D map but are punishing to a power-limited rover. Our project shows you a naive shortest path-line path and a terrain-optimized alternative on the same real DEM, and the straight line is 18% shorter in distance but 55% worse in energy score, you have experienced — not just been told — the core design challenge of every Mars surface operation.

How I built it

C++ Core: A terrain traversability engine that ingests HiRISE DTM files (GeoTIFF format, available on AWS via the NASA/USGS Open Data Registry), computes slope and roughness maps per cell (terrain gradient with Sobel operators), and outputs an occupancy/cost grid. This is the exact computational kernel used by JPL's ENav system on Perseverance. C++ with Eigen for matrix ops, GDAL for raster parsing, OpenMP for parallel grid computation. The engine compiles to WebAssembly for browser deployment.

RL Component: A Gymnasium environment wrapping the C++ terrain engine. The rover's observation space is a local egocentric patch of the cost grid (e.g., 64×64 cells around its position) plus its current heading and battery state. The action space is discrete (8-directional) or continuous (heading + speed). Reward: reaching the science target with minimal energy expended and zero hazardous-cell traversal. Train with PPO or DQN; the trained policy runs live in the browser via ONNX.js export. The RL visualization shows the agent's learned value function overlaid on the terrain — an intuitive "heat map of where the rover wants to go" that non-experts immediately understand.

Visualization: WebGL rendering of the actual Jezero Crater DEM as a 3D heightmap (recolored by terrain type from the AI4Mars dataset labels). Users can drop a "science target" pin anywhere on the crater and watch the RL agent plan and execute a path in real time.

Challenges I ran into

The biggest technical wall was the data pipeline. HiRISE DTMs ship as Cloud Optimized GeoTIFFs with Mars-specific coordinate reference systems, not the WGS84 lat/lon a web app expects. Getting GDAL to correctly ingest the Jezero crater mosaic, resolve the IAU planetary CRS, resample to a clean float32 binary at 1 m/pixel, and produce a geotransform JSON that the Three.js scene could actually use took longer than the entire C++ slope engine. One misread projection and your "crater" is a flat rectangle.

On the RL side, the early reward function produced a policy that learned to spin in circles near the start which is technically avoiding high-cost cells, technically not failing. Redesigning the reward to penalize time and cost while rewarding genuine progress toward the goal (not just distance reduction) required three full training runs and a lot of staring at episode replay logs before the agent started behaving like something you'd want to drive a rover.

Accomplishments that I'm proud of

What I learned

The gap between "elevation data" and "traversability model" is enormous, and every decision you make filling that gap is a form of physical intuition made explicit. Deciding that slope cost should scale quadratically, not linearly; that obstacle inflation radius should reflect rover body width, not just the cell size; that sand incurs a roughness penalty even at low slope and each of these is a mini-design choice that determines whether the insight the visualization promises actually emerges from the data. We learned to respect how much domain knowledge is embedded in a cost function that looks like five lines of code.

We also learned that reinforcement learning on real terrain is much slower to converge than on toy grid worlds, and that reward shaping is where most of that training time gets spent. The final reward function looks simple. Getting there was not.

On the visualization side: Three.js displacement mapping is powerful but unforgiving. The rendering pipeline from GeoTIFF → float32 binary → UInt16 PNG → WebGL texture → displaced geometry has exactly zero room for silent type coercions, and each boundary is a potential bug.