Inspiration

The inspiration for this product was from the problem that my photographer sister and also i had been facing especially while managing and searching for specific type of photoes. When i saw outside, it was problem for more people and not just limited to my house.

What it does

This project is an intelligent image management and search system that helps users organize and find photos based on people’s faces and contextual details. Users can:

  1. Search images by uploading one or multiple face photos, finding pictures containing those specific people.
  2. Combine face search with text queries (like “wearing blue shirt”) to refine results.
  3. Group photos automatically by who appears in them or by visual similarity (like similar backgrounds or clothing).
  4. Prioritize important people (e.g., bride, groom, parents) to create personalized albums with a flexible percentage of images for each.
  5. Perform batch editing operations on many images at once, such as resizing, adding watermarks, background blurring, or format conversion.
  6. Use AI-powered services to label image quality or extract meaningful tags from images to improve search and categorization.

How we built it

I started by implementing the core search features: Extracting face embeddings to enable face-based and multi-face group searches. Leveraging CLIP embeddings for contextual search by text and image. Combining these for hybrid queries (face + descriptive text). Then I built smart grouping: Clustering images by faces and by visual context using embedding-based clustering. Allowing user-driven grouping based on importance of people, using randomized selection thresholds. For image enhancement, I added batch operations like resizing, watermarking, and background blurring using open-source AI models (rembg for segmentation).

Finally, I integrated Google’s Vision API for image labeling and set up infrastructure plans for AutoML-based image quality classification

Challenges we ran into

One major challenge was balancing search precision and recall when combining face and text queries ,tuning similarity thresholds and weighting scores to get the most relevant results without losing good matches. Other challenges that we had to face a lot was debugging the errors. Since being new to MongoDb it was nightmare solving the issues connecting and integrating different models and tools.

Accomplishments that we're proud of

We are proud that we have solved some of the core problems in this inductry that will definetly help the big and small photographers in fast-tracking their tasks.

What we learned

We learned MongoDb and its advance use of vector search mechanism. We learned the integration of advance pre-trained models and different tools that i had never used before. I also explored clustering algorithms (DBSCAN, HDBSCAN) to intelligently group similar photos

What's next for Photo-vision

Many advance features are yet to be added and also Ui design is also to be improved due to delay in project completion. We will be looking forward to launch app, once wee feel we are good to go and we have solves a cored problem in this field.

Share this project:

Updates