Inspiration
One day, I was looking through my gallery of about 6000 photos and wanted to find a picture of myself on a beach wearing a blue shirt. It took forever to find it, and that's when I realized I needed a better way to search my photos. That's how the idea for Imace was born - an app that can help people find their photos easily using natural language.
What it does
Imace lets you find your photos using natural language, just like you'd describe them to a friend. Want to find that picture of your "cat sleeping on a beach"? Just type it in, and Imace will show you the most relevant results.
This is possible thanks to CLIP, a powerful AI model that understands both images and text. CLIP encodes your photos and your search queries in a way that allows for accurate matching, even with complex descriptions.
But that's not all. Imace also presents your photos in an interactive 3D space powered by Three.js.
And the best part? Everything happens locally on your computer. Your images and data never leave your device, ensuring your privacy is protected.
How we built it
Frontend:
- Next.js: This React-based framework provided the foundation for building a performant and user-friendly web application.
- Tailwind CSS: This utility-first CSS framework streamlined the styling process, allowing for rapid UI development.
- Three.js (react-three-fiber): This powerful library was instrumental in creating the interactive 3D visualization of the image collection.
Backend and Processing:
- FastAPI with Uvicorn: This combination provided a high-performance API for handling image uploads, processing, and embedding generation.
- PyTorch: This popular deep learning library was used to integrate and leverage the CLIP model for image and text encoding.
- SQLite: A lightweight database was chosen for efficient local storage of image data and embeddings.
Log in or sign up for Devpost to join the conversation.