Inspiration

Having a profound interest in data analysis and information retrieval, the challenge was an opportunity for us to put some of our expertise into practical experience. We took up the Walmart challenge with a motive to build the search engine while improving the relevancy of the search results. This will enhance a user's experience of the search engine.

What it does

By targeting the keywords in the query box of the submitted UI, ten most relevant product results hosted on the Walmart e-commerce site gets popped up for the user to navigate and shop!

How I built it

We accomplished 4 major tasks to build the entire project.

  1. Firstly we crawled Walmart's website to collect some product web pages for different categories of products.
  2. We pre-processed the data using tf-idf along with LSA to get the vectors for our documents and queries.
  3. To reduce complexity, we used the top 100 features after LSA. Our initial query results are the closest web pages in terms of cosine similarity.
  4. After this, we use clustering to improve our results. We perform clustering after relevance results to find out the most relevant cluster and display that as our search results.

Challenges I ran into

The data crawling had to be tweaked to work around forbidden errors and overall took up a significant amount of time. We tried different feature set for our documents like word2vec but they didn't perform comparably to tf-idf. Overall, the complexity of the task was challenging.

Accomplishments that I'm proud of

Getting it up and running in a day! yay

What I learned

That iteration is the best way to improve your models.

What's next for Walmart Product Search Engine

We could use PageRank for improving upon the current ranking models. We could also try multimodal clustering for better clusters. Ticking off the best practices here would be a decent optimization.

Share this project:

Updates