I see that when I select a product on the Tchibo web store (eg there are no recommendations for similar products. Showing similar products might increase the time spent on the website and also might increase the revenue. It might help the consumer to find the products they wanted.

What it does

Currently, my algorithm calculates the similarity of products using their 3 features. Category, price, and description. Category and price are used for pre-selection. In preselection, it simply calculates similarity of the product to all other products and filters the most similar ones.

After pre-selection, it also calculates similarities of descriptions. This is simply finding how is similar is a paragraph to another. Here I used spacy to calculate similarities. Modern deep learning solutions usually map words to vectors. To do that I used pre-trained german modal

After that, it calculates a "recommendation score" by simply getting a weighted average of similarities of 3 features.

How we built it

Firstly, it needs all the product data. I fetched them using API then saved them as JSON files page by page. Then I saved them to a single file. After I get the data, I used 2 features to make pre-selection and one other feature to produce a final recommendation score.

I used anaconda python 3.7, spacy, flask and tchibo API

Challenges we ran into

  • Fetching all the data was a little hacky.
  • Ideally, I should calculate the similarity of descriptions for each product but calculating the similarity of 2 descriptions is time-consuming. For this reason, I implemented a fast pre-selection mechanism that considers all the possible pairs for a product.
  • To fine-tune the configuration, I implemented a simple flask web server. Since I never did before, it was a little hard.

Accomplishments that we're proud of

  • An algorithm that shows convincingly similar products to a selected product
  • A dynamic architecture to play with all the configuration variables. For example, one might consider the price of a product should be a lot more important than its category. You can also set the pre-selection size.

What we learned

  • An easy way to find similarity scores between 2 texts using pre-trained deep learning models and word vectors.
  • Developing a simple web server with flask
  • How to work with relatively big data sets.

What's next for Recommender

  • There are lots of features that can be used to include in the similarity score. For example gender, age, title, size, color, etc... Some of these are categorical variables so custom similarity functions can be defined for each categorical feature.
  • The new trend in data science is not to select any features but give them directly to a deep learning model. This is harder to debug and it might need training the deep learning model with large computing resources. Also, in practice, rule-based systems might be convenient but combining best of the both worlds is could be the best solution (a rule-based or manual feature selecting recommendation calculation system with a deep learning model).
  • Ideally seeing the history of purchases will give us lots of information. We can make recommendations to a user based on other users who are similar to the current user. Basically, we can say "user x is similar to user y, so let's recommend user y the previous purchases of user x"
  • I used weighted averages in calculating recommendation scores and in pre-selection. Here more complex models can be used.

Built With

Share this project: