Inspiration I'm the friend who always ends up taking the photo — and the one everyone hands their phone to when they want it to actually look good. Which means I spend half of every hangout repeating the same things on loop: "move left," "no, your left," "get closer," "tilt it down a bit," while someone tries to recreate a shot from last year, match the framing we got two minutes ago, or just point the camera at the right thing. By photo number ten I'm not even looking through the lens anymore — I'm just calling out corrections from memory, and I'm never actually in the picture. I wanted to build the thing that does that part for me: a camera that gives the same real-time corrections I'm always shouting, so I can hand someone my phone, walk into frame, and trust it to get them there.

What it does Upload a reference photo — a portrait, a product shot, a room, anything — and GhostFrame turns your camera into a live guide back to that exact shot. In real time it shows corner brackets that shift from cyan to green as you align, a pose skeleton or target reticle marking where the subject should sit, and a specific caption ("Move closer," "Left side doesn't match") instead of a bare number — the exact corrections I'd be shouting across the room, just coming from the phone instead of me. Once aligned, capture with a shutter flash and haptic buzz, review the photo in place, and save or share it — all in the browser, on either camera, on phone or desktop.

How I built it The stack is Next.js (App Router), TypeScript, and Tailwind, with MediaPipe's PoseLandmarker (GPU-accelerated, client-side) handling body pose. The part with no off-the-shelf model is "does this non-portrait photo match" — a similarity engine I wrote from scratch, combining four independent signals per frame:

What I learned The biggest lesson: a similarity score is meaningless without knowing what unrelated inputs would score. I almost shipped a metric where "doesn't match at all" and "pretty close" were separated by only a few points, because I hadn't accounted for the statistical floor of my own signal. More generally — browser media APIs are full of sharp, undocumented edges that only surface on a real device, and the corrections that actually help someone aren't a percentage to chase, they're one specific, plain-English instruction at a time — which is, it turns out, exactly what I'd been giving my friends all along without realizing it was a product.

What's next for GhostFrame Depth-aware distance estimation for more precise "move closer/farther" guidance, preset templates for common recreation scenarios, a burst mode with before/after export, and cloud sync so a reference set up on one device is ready on another.

Built With

Share this project:

Updates