Update: follow agents training and run results here

Inspiration

In October 2022, an attacker drained $116M from Mango Markets in a single transaction — not through a code bug, but through economic logic [1]. They used their own capital to manipulate the spot price of MNGO tokens on-chain, tricked the oracle into accepting the inflated price as real, then borrowed against that phantom collateral until the protocol was empty. No exploit kit, nor zero-day, just the right sequence of legitimate DeFi actions. That attack is what Project Mango is built around. If a human could find that sequence, could a reinforcement learning agent find it too — and could we use that to audit protocols before they go live?

What it does

Project Mango is a reinforcement learning environment that simulates a realistic DeFi ecosystem — an AMM, a lending protocol, a price oracle, and a crowd of reactive traders. An RL agent operates inside this simulation with the same toolkit a real attacker has: swap tokens, deposit collateral, take out loans, and execute flash loans. The agent learns purely from reward — profit relative to a baseline portfolio — with no hard-coded strategy. The goal is for it to autonomously rediscover oracle manipulation attacks, flash loan exploits, and collateral inflation sequences through trial and error. A more complex multi-exchange environment extends this to cross-protocol attacks. The end product is a red-teaming tool: point it at a new protocol's parameters, let it run, and see what it finds.

How we built it

The environment is built on Gymnasium, with custom actor classes for each protocol component — BaseAMM (constant-product AMM), BaseLender (collateralized lending with liquidations), BaseOracle (spot or TWAP price feed), and BaseCrowd (EMA-based reactive market sentiment with built-in panic mode). Actions are structured as sequences of several sub-actions per block, mirroring how real on-chain transactions are bundled. Flash loans are handled atomically — if the agent can't repay within the block, the entire transaction reverts, exactly as it would on-chain. A cascade liquidation loop handles the downstream effects of price manipulation on other borrowers (i.e. a large number of positions gets liquidated, triggers price drop, which causes panic sells, causing more automated liquidations). The agent is trained with standard deep RL (PPO), with a symlog reward transform to handle the extreme reward variance that comes with leveraged DeFi positions.

Challenges we ran into

We had built a flash-loan action in our environment, hopping our agents would learn a flash-loan attack -- flash-loan a large amount Y, swap for X to pump market price, use that X as collateral to lawfully borrow more Y, pay back the flash loan and keep the change. So executing a flash-loan attack requires discovering a precise sub-action sequence in exactly the right order. Because failed transactions revert with a gas penalty, the agent quickly learned that the flash-loan button was a self-destruct switch. It stopped exploring the attack vector entirely, choosing small consistent losses over catastrophic reverts. The exploit was always available; the agent was just too rational to touch it. We also had to kill the "High Watermark Fallacy" — early reward functions let the agent pump token prices, collect a massive paper-wealth reward, then walk willingly into liquidation having already locked in its high score. We fixed this with a hard end-of-episode settlement that forces full position unwind at real AMM prices, with real slippage. Paper wealth stopped being rewarded; only extracted stablecoins counted.

Accomplishments that we're proud of

Once we patched both failure modes, the agent independently rediscovered the Mango Markets exploit from first principles — no hints, no hard-coded strategy, just reward signal. We watched it learn to take massive loans, dump it into the AMM to spike the spot price, use that inflated valuation to drain the lending protocol's stablecoin reserves, and exit clean before the oracle corrected. The same sequence that extracted $116M from a live protocol in 2022, derived purely by optimizing math in a simulation. Cool!

What we learned

Standard RL breaks in DeFi — a successful exploit can turn $1,000 into $1,000,000 in a single block, and linear rewards produce gradient explosions that fry the network entirely. The fix was treating rewards like a quant would: symlog-transforming returns to compress extreme outcomes while preserving their ordering, keeping training stable without muting the signal that large profits are the goal. We also learned that a secure protocol produces a lazy agent. When we hardened the simulation with deeper AMM liquidity and a TWAP oracle, rewards flatlined and the agent simply stopped trading — it had calculated the exploit was no longer profitable. That turned out to be the most useful property of the whole system: if Project Mango finds nothing, that's not a failure. That's the proof of security.

What's next for Project Mango

The current environment is a strong foundation but still a simplification. The natural next steps are expanding to more realistic multi-asset, multi-protocol graphs where cross-protocol contagion can emerge — the kind of systemic risk that's hardest for human auditors to reason about. Integrating with real protocol ABIs to auto-generate environments from deployed contracts would make it a practical auditing tool. Longer term, the agent's discovered exploit sequences could be used to generate adversarial test cases for formal verification pipelines.

References

[1] https://infotrend.com/mango-markets-madness-a-case-study-on-the-mango-markets-exploit/

Built With

  • gymnasium
  • python
  • stable-baselines3
+ 2 more
Share this project:

Updates