MadeByClaude — Devpost Submission

Inspiration

Dario Amodei's essay "Machines of Loving Grace" asks a question that hit us hard: as AI takes over routine work, how do humans find meaning? His answer — and ours — is that creativity becomes more important, not less. But access to creative tools has never been equal. A student at CMU can walk through one of the most architecturally rich campuses in the world every day and never have the tools to turn that experience into something shareable, permanent, or expressive.

We wanted to build something that takes the world around you — the one you walk through every day — and transforms it into art. Not as a filter, but as a full reconstruction: a 3D world you can navigate, stylized into a comic that feels alive.

What it does

MadeByClaude turns any video into an animated comic world in three steps:

Film — Walk through a space with your phone and record a video
Reconstruct — HY-World 2.0 (WorldMirror 2.0) converts the video into a navigable 3D scene: point cloud, 3D Gaussian splats, depth maps, surface normals, and camera trajectory
Stylize — AnimeGANv2 transforms every frame of the 3D fly-through into comic/anime art

We demoed the full pipeline live at the hackathon using footage shot on CMU's campus and Portal Hall. The output is a comic fly-through of a real place — something between a graphic novel and a video game cutscene.

How we built it

HY-World 2.0 (WorldMirror 2.0) — Tencent's ~1.2B parameter feed-forward model that reconstructs 3D Gaussian splats, depth, normals, and camera parameters from a video in a single forward pass (~7 seconds on an A100)
AnimeGANv2 (bryandlee/animegan2-pytorch, face_paint_512_v2 style) — frame-by-frame comic stylization of the rendered fly-through
Modal — cloud GPU infrastructure (NVIDIA A100) for both inference jobs, with Docker image caching so rebuilds are instant after the first run
Gradio — web demo that lets anyone upload a video and get back a comic version live
Claude Code — used throughout to scaffold the Modal pipeline, debug CUDA/ABI mismatches, wire up the Gradio app, and iterate fast under hackathon time pressure

The full pipeline runs end-to-end on cloud GPUs — no local GPU needed. Anyone with a phone and a Modal account can replicate it.

Challenges we ran into

flash-attn ABI mismatch — PyTorch wheels from download.pytorch.org/whl/cu124 use cxx11abiFALSE, but we initially installed the TRUE variant of the flash-attn wheel. Took several build-debug cycles to identify.
Modal image layer caching — each .pip_install() call is a separate cached layer. Changing flash-attn invalidated all downstream layers, forcing rebuilds of gsplat, the HY-World clone, and the env layer.
torch.hub 504 timeouts — downloading AnimeGANv2 weights at runtime hit gateway timeouts on Modal's network. Fixed by baking the weights into the image at build time using .run_commands().
1800-frame timeout — portalHall.mp4 was 60 seconds of video (1800 frames), which exceeded the 20-minute GPU timeout. Solved by trimming to 30 seconds with ffmpeg locally before uploading.
Output filename collisions — both fly-through videos were named rendered_rgb.mp4, causing video2 to overwrite video1's stylized output. Fixed by including the parent folder name in the output filename.

Accomplishments that we're proud of

Full end-to-end pipeline from phone video → 3D reconstruction → comic animation, built and running in a single hackathon day
CMU campus reconstructed in 3D and rendered as a comic fly-through — a real place, made permanent and expressive
Zero local GPU required — the entire pipeline runs on Modal, replicable by anyone with git clone and a Modal account
7-second inference time for 3D reconstruction of a 22-frame video on A100

What we learned

HY-World 2.0's WorldMirror is genuinely fast for feed-forward 3D reconstruction — no optimization loop, no NeRF training, just one forward pass
Modal's layered image caching is powerful but requires intentional ordering of build steps to minimize rebuild cost
Baking model weights into the image (rather than downloading at runtime) is the right pattern for reliability on cloud GPU infrastructure
AnimeGANv2's face_paint_512_v2 style works surprisingly well on architectural scenes, not just faces — the comic look transfers cleanly to buildings, paths, and outdoor spaces

What's next for MadeByClaude

Full-video reconstruction — currently capped at 32 frames for the 3D pass. Increasing this with multi-GPU inference would produce denser, higher-quality 3D worlds
More comic styles — swap between Hayao, Paprika, and custom-trained styles via the Gradio UI
Interactive 3D viewer — embed the gaussians.ply output directly in the web demo using a WebGL 3DGS renderer so users can navigate their comic world in the browser
One-click pipeline — combine run_world_model.py and stylize_video.py into a single Modal job that goes video → comic with one command
Preservation use case — apply this to historically significant or endangered spaces: document them in 3D, stylize them, make them shareable and permanent

Built With

modal
python

Updates

Aman Gupta started this project — Apr 18, 2026 11:58 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.