Inspiration

Have you ever felt like your favorite pair of pants needs a cool shirt or jacket that you just don't have yet? We've got you covered. Just grab those pants, take a quick pic, and let us scour the web to return a carefully curated list of products to complete your new fit.

What it does

Once we have an image of the item you want to complement, we scan your favorite clothing brands (from Abercrombie to Zodiac), and return a carefully compiled, AI-matched list of clothes pairings from across the internet. Our app considers some of the most popular styles and pairings so that you don't have to spend hours looking at every brand’s website to shop for new clothes.

How we built it

We utilized requests_html to get links to various products on different brand market pages. We then used Selenium to visit these links and get individual product images with their links, as well as images of multiple items that the retails recommends style well with the selected product. This provides us ground truth data for what can be considered ‘fashionable pairings’.

After this, all images were fed into the GPT-4 Vision API to get detailed text descriptions of each clothing item. We converted these text descriptions to create a text vector embeddings with each vector representing an item of clothing. We then stored a mapping of every vector to their ‘recommended item’ vectors in Google firebase. All vectors are also stored in a Pinecone vector database, for us to run vector similarity searches.

Now when a user uploads a photo, we use Vision API to get the text vector embedding of the clothing item in the image, find the top five most similar vectors to that embedding in the Pinecone database, and then return two recommended items for each vector. This doesn’t stop here, however, as we now use these embeddings as the new vectors in a query to the vector database again. This allows us to find the top matches across multiple clothing brands. The five most relevant pieces are scoured in less than 1 second and we display the product images as well as provide links to the product in the store. Should the user go on to upload a different image to find styles, we also store the previous recommendations in the user’s app for future reference.

We have built a web app using ReactJS to host the entire product and provide an easy-to-use interface. This web app can be accessed using mobile browsers as well, and allows users to upload an image from their gallery or take an image from their camera directly.

Challenges we ran into

Our first challenge was data collection. There was no dataset that could describe clothes as well as their best recommendations and link them to where a user could access them. Moreover, a ground truth of what is fashionable was difficult to formalize. To overcome this, we scraped websites along with product links and used recommendation images to generate pairings.

Secondly, OpenAI’s Vision API is rate-limited to 100 requests a day! We were able to optimize our querying, but this limited the size of our dataset.

Accomplishments that we're proud of

We achieved a very accurate match for products, and our project immediately creates value in simply auto-searching clothing websites, and that is not even including the value of getting pairing recommendations. Considering the pairing recommendations, we have created novel clothing combinations, since the vector similarity is able to match pairings in a dataset larger than simply a single store’s page or recommended styles. Lastly, we are proud of having achieved a very scalable product. Overcoming the OpenAI API use restriction and the cost of building the vector database (500 clothing item vectors for $10), we would be able to describe a lot more products and improve on the number and quality of recommendations. Not just this, our data corpus would have been much more comprehensive and broad, allowing for a lot more style options.

What we learned

We learnt how we can use Selenium for web scraping and retrieving images as well as URL links. We learnt to use text vector embeddings and the astonishing capabilities of GPT Vision. We learnt to index and perform vector similarity searches with Pinecone, and manipulate information given out by the GPT Vision API. We also learned the use of vector embeddings to compare text descriptions effectively and objectively, to identify patterns in matching outfits.

What's next for Styles.compare

Given the budget to use the OpenAI GPT-4 Vision API, we will be able to build a larger dataset which can increase the clothing options available as well as bolster the accuracy and robustness of our system. One approach to overcome this is to be independent of OpenAI API through fine tuning and other comparable computer vision models such as LLaVa. In the future, we would use high quality descriptions of clothing items to train and fine-tune a LLaVA model. Not only would this lead to better accuracy, we could probably avoid issues with any API limits and costs would be a much smaller burden.

The other addition that could be made is the incorporation of a style assistant chatbot that could take user’s prompts and provide recommendations. We believe that this is an achievable goal since our model already provides the recommendations needed.

We also envision a version where only a user defined prompt can return the user clothing product recommendations across several websites on the web to give them the most cost-effective and snazzy outfit.

Thanks for taking the time to read about our project!

Built With

Share this project:

Updates