inspiration

agents want way more steps and way fewer watts. cpu↔gpu ping-pong + headless browsers are just… nope. we asked “what if the ‘browser’ ran where the flops live” and then we actually started building it because WHY NOT.

what it does

matrix is a gpu-native javascript runtime that runs “browser-ish” apps on chip. think react/next/node vibes but the hot path lives on the gpu: timers, fs, net, workers, streams. zero-copy obs→act loops so your agent cycles are sub-ms. result: faster training, cheaper runs, greener footprints. plus you can do on-device autocurriculum and record/replay for eval without hauling bytes across pci-e. it’s giving “javascript on silicon” and i’m obsessed.

how we built it

so far we finished the foundation: 0000 through 0003. repo bootstrapped with dev shell + build system that doesn’t fight you. a clean device↔host abi with tiny fixed messages and crc’d rings so we can move work like grownups. a cpu simulator backend so all this can run in ci and we can iterate before touching real gpus. and the first slice of the runtime kernel: queues, microtasks, timers, the little scheduler that could. it’s early, but the spine is THERE and it already feels snappy even in sim.

challenges we ran into

making “sync” apis not stall the world (answer: park the fiber, don’t block the sm). keeping the abi small but future-proof (hello versioning + feature bits). writing a simulator that’s faithful enough to catch racey nonsense without being a second codebase. and honestly just herding toolchains… nix + nvcc + cargo + pnpm? chaotic but we tamed it.

accomplishments that we’re proud of

the rings fuzz without tearing. the cpu sim runs the scheduler and microtask drain exactly how we drew it on the whiteboard. the dev shell “just works” on a clean machine. and the abi feels… elegant? like, the kind of thing you can build a whole system on. also: we kept scope disciplined. 0000–0003 done and merged. yas.

what we learned

tight mvps win. ids-not-fn-ptrs for intrinsics is the move. simulators are worth the upfront pain because ci confidence is EVERYTHING. also, if you define a single publish point in the pipeline, half your determinism drama evaporates. love that for us.

what’s next for matrix

ship the fs layer with a gpu page cache and the “submit→park→resume” sync facades. bring up tcp-lite/http loopback on device so fullstack inside a namespace is actually real. land the react host-config that emits one compact batch per commit and the gpu applyBatch path. then perf harness, rr skeleton, and the first “agent loop on chip” demo. we’re not done (lol, not even close), but the runway is set and the next drops are going to be FUN.

Share this project:

Updates