Inspiration
Taking a high-quality photo of yourself solo shouldn't require a second person or a "dumb" 3-second timer that snaps the picture while you’re still running into frame. I wanted to solve the "solo selfie struggle" by building a camera that doesn't just wait—it sees. I asked myself: Why simply press a button when a neural network can calculate the exact moment you're ready? What it does
Aperture AI is an autonomous camera agent that removes human error from photography: Autonomous Shutter: Using browser-based computer vision, it tracks 33 skeletal landmarks in real-time. Stability Detection: It mathematically calculates your "stillness" and only triggers the shutter once you've struck a stable pose. AI Photography Coach: After capturing the moment, Google Gemini 1.5 Flash analyzes your composition, lighting, and posture to provide instant coaching tips. Privacy-First Architecture: By running vision models locally and filtering gallery data, the app ensures your biometric data remains yours.
How we built it
As a solo developer, I built a high-performance, local-first stack: Frontend & Hosting: Built with Next.js 14 and deployed on Vercel. Motion Capture Model: I integrated the Google MediaPipe Pose Landmarker via the @mediapipe/tasks-vision library. Edge Computing: The model runs locally on the client-side via WebAssembly (WASM), allowing for 30+ FPS tracking. Security: I used Vercel Environment Variables to securely manage my Gemini and Cloudinary API keys within Server Actions. Storage: Cloudinary manages the images, paired with custom logic that auto-hides photos after 5 minutes to maintain digital hygiene.
Challenges we ran into The "Ghost Line" Bug: Managing the lifecycle of the MediaPipe canvas in React was a battle. I solved it with a "Nuclear Option"—using React keys to force-destroy and recreate the DOM node during mode switches to ensure a clean UI. Serverless Constraints: I had to overcome Vercel’s 4.5MB payload limit for Server Actions by implementing client-side image compression. Timeout Management: I managed Vercel's 10-second timeout by implementing a Promise race for the Gemini API calls to keep the application responsive. Execution Lifecycle: I had to explicitly await Cloudinary deletion calls to ensure Vercel didn't kill the process prematurely.
Accomplishments that we're proud of
I am proud of successfully overengineering a utility app into a high-performance autonomous agent. Reaching a state where the software "feels" the user's readiness by calculating Euclidean distance and vector variance between joints in real-time is a major technical win for a solo project. What we learned
I deepened my understanding of WASM-based computer vision and how to manage complex side effects in React-based hardware interfaces. I also learned how to navigate strict serverless infrastructure constraints on Vercel to keep a heavy AI application responsive. What's next for Aperture - AI Photography Coach
I am going to continue developing this project for my own interest to perfect my self-photography and personal creative projects. My goal is to refine the "AI Coach" to recognize more diverse stylistic choices and ensure the tool remains a reliable, privacy-focused asset for my own workflow.
Built With
- cloudinary
- google-gemini-1.5-flash
- google-mediapipe-(pose-landmarker)
- javascript
- lucide-react
- next.js-14
- typescript
- vercel
- wasm)
- webassembly



Log in or sign up for Devpost to join the conversation.