ECHO | Devpost

Thumbnail
Example of guideline generation
Example of guideline generation
Example of guideline generation
Example of guideline generation
Example of guideline generation
Example of guideline generation
Example of guideline generation

Inspiration

We were inspired by the common challenge of trying to recreate a cool photo you've seen online or in your camera roll. It's often difficult to get the framing, perspective, and subject placement just right. We wanted to create an intelligent tool that acts as a personal photography guide, helping anyone capture professional-looking compositions with ease.

What it does

ECHO is an iOS camera app that helps you replicate the composition of any photo. You start by uploading a reference image you like. The app sends this image to our AI-powered server, which analyzes its core compositional elements: the outline of the person, the main horizon line (even if tilted), and the key perspective lines that converge on a vanishing point.

The server then generates a clean, semi-transparent "template" of this composition and sends it back to the app. This template is overlaid on your live camera feed, allowing you to perfectly align your subject and the scene to match the original shot. The overlay is fully interactive—you can move, scale, and rotate it to fit your environment.

How we built it

ECHO is a full-stack application with an iOS frontend and a Python backend. iOS App (Frontend): Built natively in Xcode using SwiftUI. It provides full manual camera controls (zoom, exposure), gesture controls for manipulating the template overlay, and a networking layer to communicate with our server.

AI Server (Backend): We built a web server using Flask and deployed it on Hugging Face Spaces. This allowed us to use powerful Python libraries for the heavy lifting.

Computer Vision Pipeline: The core analysis is a multi-stage process.

Person Segmentation: We use Google's MediaPipe Selfie Segmentation model to get a fast and accurate mask of the person in the photo.

Perspective Analysis: We developed a custom algorithm using OpenCV that uses the Line Segment Detector (LSD) and MeanShift clustering to find line intersections, robustly identifying the primary vanishing point (even if it's off-screen) and the key lines that define a one-point perspective.

Horizon Detection: A separate function analyzes the image's gradients to find the dominant tilted horizon line.

Template Generation: The server combines these elements into a final, aesthetically pleasing PNG with a transparent background, drawing smoothed outlines and compositional lines before sending it back to the app.

Challenges we ran into

Our biggest challenge was the classic "on-device vs. server" decision. We initially attempted to run the AI models directly on the iPhone using Core ML. However, we ran into significant performance inconsistencies and subtle bugs related to model conversion and image pre-processing. Pivoting to a server-based architecture solved these issues but introduced new challenges in deploying a complex Python environment. We had to debug several dependency issues in the minimal Docker container on Hugging Face, specifically missing system-level graphics libraries required by OpenCV and MediaPipe. Iterating on the vanishing point detection algorithm to make it robust was also a significant challenge that required a lot of testing and refinement.

Accomplishments that we're proud of

We are incredibly proud of building a complete, end-to-end application that solves a real-world problem. Successfully deploying a sophisticated multi-stage computer vision pipeline to a live server is a major accomplishment. We're also proud of the final UI, which is clean, intuitive, and features a custom animated loading indicator and icons, providing a polished user experience.

What we learned

This project was a deep dive into the practicalities of building an AI-powered application. We learned the critical importance of matching the data processing pipeline between testing and deployment. We gained valuable experience in server deployment with Docker and Flask on a platform like Hugging Face. Most importantly, we learned that building a robust computer vision system is an iterative process of identifying weaknesses and systematically improving the algorithm with more advanced techniques.

What's next for ECHO

The current version of ECHO is a powerful proof-of-concept, and we're excited about its future. The next major step is to expand the perspective analysis to reliably detect two- and three-point perspectives for more complex architectural and landscape shots. We also plan to implement the template-saving feature, allowing users to build a personal library of their favorite compositional guides.