Inspiration
AI search from Google Photos,
What it does
This application allows users to search for images using contextual, natural language descriptions. Instead of relying on filenames or tags, a user can simply describe what they are looking for to retrieve the most relevant images from their collection.
How I built it
I used Gemini, Google's multimodal model on Vertex AI, to generate vector embeddings for each image. These embeddings were then stored in a Qdrant vector database. To find matching images, I implemented semantic search, which retrieves results by comparing the user's query to the stored image embeddings.
Challenges I ran into
Initially, the model struggled with queries longer than a few words and failed to return relevant results for complex descriptions.
To overcome this, I implemented a two-level search system. First, I used an image captioning API to generate detailed descriptions for each image, providing richer text to search against. The search process now combines a top-k similarity vector search for broad contextual relevance with a strict keyword match to refine the results. This dual approach ensures high accuracy even for long and specific queries.
Accomplishments
I have developed a system that achieves highly accurate results and can successfully interpret long, descriptive user queries. This overcomes the initial limitations of the model and makes the search functionally robust and user-friendly.
My learnings
This project was a great learning experience. I gained hands-on skills in:
Generating and utilizing vector embeddings for images.
Leveraging the Gemini API to extract both image vectors and text descriptions.
Implementing an effective semantic search pipeline using a vector database like Qdrant.
What's next for Contextual Image Album
Expanding the functionality to include video search.
Integrating user accounts to allow for personal, private image albums.
Improving the user interface (UI) for a more seamless experience.
Built With
Python
Gemini on Vertex AI
Qdrant
Log in or sign up for Devpost to join the conversation.