The Journey of Building a Hybrid Fashion Recommender System Inspiration This project was inspired by the desire to create a more intuitive and holistic shopping experience. We noticed that traditional recommendation systems often rely heavily on either text data or collaborative filtering alone. We wanted to see what would happen if we combined those methods with image-based insights to recommend products that truly match a user’s style on multiple levels.

What We Learned We learned a ton about integrating different types of data. In particular, we discovered that combining textual product descriptions, user behavior data, and image embeddings (using models like CLIP) can significantly improve recommendation accuracy. It was fascinating to see how each data type contributed a unique layer of understanding to what a user might like next.

How We Built It We started by preprocessing our dataset, ensuring that each product had text descriptions, user interaction data, and high-quality image URLs. We then used a model like CLIP to generate numerical representations, or "embeddings," for each product image. These were combined with scores from our text analysis and collaborative filtering models. In mathematical terms, you could say we optimized a scoring function that blended these components:

score=α×text_sim+β×collab_sim+γ×image_sim where α, β, and γ are weights we tuned to balance the influence of each component on the final recommendation.

Challenges Faced One of our biggest challenges was aligning the different data modalities. Images, text, and user interactions all have different digital representations and scales, so normalizing and balancing their contributions was tricky. We also faced some performance challenges when dealing with the large number of images in our catalog, but implementing a system to cache the image embeddings helped speed things up considerably.

Built With

Share this project:

Updates