Inspiration

Taking a high-quality photo of yourself solo shouldn't require a second person or a "dumb" 3-second timer that snaps the picture while you’re still running into frame. I wanted to solve the "solo selfie struggle" by building a camera that doesn't just wait—it sees. I asked myself: Why simply press a button when a neural network can calculate the exact moment you're ready? What it does

Aperture AI is an autonomous camera agent that removes human error from photography: Autonomous Shutter: Using browser-based computer vision, it tracks 33 skeletal landmarks in real-time. Stability Detection: It mathematically calculates your "stillness" and only triggers the shutter once you've struck a stable pose. AI Photography Coach: After capturing the moment, Google Gemini 1.5 Flash analyzes your composition, lighting, and posture to provide instant coaching tips. Privacy-First Architecture: By running vision models locally and filtering gallery data, the app ensures your biometric data remains yours.

How we built it

As a solo developer, I built a high-performance, local-first stack: Frontend & Hosting: Built with Next.js 14 and deployed on Vercel. Motion Capture Model: I integrated the Google MediaPipe Pose Landmarker via the @mediapipe/tasks-vision library. Edge Computing: The model runs locally on the client-side via WebAssembly (WASM), allowing for 30+ FPS tracking. Security: I used Vercel Environment Variables to securely manage my Gemini and Cloudinary API keys within Server Actions. Storage: Cloudinary manages the images, paired with custom logic that auto-hides photos after 5 minutes to maintain digital hygiene.

Challenges we ran into The "Ghost Line" Bug: Managing the lifecycle of the MediaPipe canvas in React was a battle. I solved it with a "Nuclear Option"—using React keys to force-destroy and recreate the DOM node during mode switches to ensure a clean UI. Serverless Constraints: I had to overcome Vercel’s 4.5MB payload limit for Server Actions by implementing client-side image compression. Timeout Management: I managed Vercel's 10-second timeout by implementing a Promise race for the Gemini API calls to keep the application responsive. Execution Lifecycle: I had to explicitly await Cloudinary deletion calls to ensure Vercel didn't kill the process prematurely.

Accomplishments that we're proud of

I am proud of successfully overengineering a utility app into a high-performance autonomous agent. Reaching a state where the software "feels" the user's readiness by calculating Euclidean distance and vector variance between joints in real-time is a major technical win for a solo project. What we learned

I deepened my understanding of WASM-based computer vision and how to manage complex side effects in React-based hardware interfaces. I also learned how to navigate strict serverless infrastructure constraints on Vercel to keep a heavy AI application responsive. What's next for Aperture - AI Photography Coach

I am going to continue developing this project for my own interest to perfect my self-photography and personal creative projects. My goal is to refine the "AI Coach" to recognize more diverse stylistic choices and ensure the tool remains a reliable, privacy-focused asset for my own workflow.

Built With

Share this project:

Updates