dewy | Devpost

Home Page

Inspiration

All the members of our team have an interest in skincare and are invested in having good products that have clean ingredients so we wanted to make a system to make it easier to find other products with similar ingredients.

What it does

This website takes in a user's product that they already like and returns the 5 most similar products that we had based on the similarity of the ingredients between the user's product and the ingredients of the products in our dataset using ChemBERTa, a large pre-trained model for molecular property prediction.

How we built it

Front End: HTML, CSS, JavaScript Back End: Flask, Selenium, SQL (almost)

First, we take the user's input (the url to their favorite product on either Ulta or IncideDecoder) and use Selenium to scrape the ingredients list of the product. We then feed this ingredients list into the ChemBERTa model and the model uses the ingredients list to calculate the similarity between the ingredients of the user's product and the similarity between the products in our dataset (https://www.kaggle.com/datasets/eward96/skincare-products-clean-dataset). The model takes in a list of compounds of SMILES (Simplified Molecular Input Line Entry System) notation, which is a standard notation for chemical compounds in chemoinformatics and returns the top 5 most similar products. Then, we take the output of the model and display the names of the products on the webpage.

Given more time, we would finished connecting our SQL database to our front end to display the images of the top 5 most similar products. We filled this database with the name of the product and its corresponding image, which was scraped from Google Images search using Playwright.

Challenges we ran into

At the beginning, we had issues defining our approach to our skincare recommendation system. A significant portion of our time at the beginning was spent tweaking our idea and what features we would use to produce our output. We originally planned to do a computer vision problem where we took images of people's skin and used a pre-trained model to identify types of acne, need for SPF, oiliness, etc. and then used those results to determine which products were best based on how the ingredients of the products in our dataset. However, this was not feasible due to:

1) the lack of a properly labelled dataset for types of acne, need for SPF, oiliness
2) the lack of a properly labelled dataset for how each ingredient interacts with the skin (unless we manually researched every ingredient in the dataset and made it ourselves)
3) 36 hour time constraint so we had to backtrack and change our idea.

One focal challenge was converting all of the ingredients of each product in our dataset into SMILES notation. Originally, we used an online compound name to SMILES converter, but this method failed because of the amount of requests that we had to make to the website. We then eventually found the cirpy package that has a function that converts compound names into SMILES notation, and were able to convert into SMILES format easily.

Following that, we had trouble determining what the best approach for model evaluation would be since we do not have access to a chemist on hand to verify whether or not our results are in fact similar or not. We decided to use Tanimoto index as a metric for evaluation of the compounds since it is regarded as one of the best similarity metrics for Morgan fingerprints forms of the SMILES (source: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-015-0069-3). Morgan fingerprints are built by applying the Morgan algorithm to a set of user-supplied atom invariants. You can read more about it here!

Additionally, this was the first time that all of us had connected the front end and backend together, so we had a lot of difficulty with that. Particularly, we had issues connecting the search bar query to the web scraping script to extract the ingredients from the user inputted product.