Elastic search has existed at Wayfair for a very long time. As part of the ELK stack (Elastic, Logstash, Kibana), it powers nearly all of the syslogging of our entire web service. Wayfair's heartbeat is effectively captured by these little bar graphs and line charts and are monitored 24/7 by our Frontline team to anticipate outages and react quickly and effectively. This technology is time tested and is worth its weight in gold.
Its underlying architecture, Lucene, however, can do a whole lot more than just logging. Using its inverted index, it can power analytics, data modeling, and even generate recommendations! We decided to set out to prove this alternative use case for Elastic and are very pleased with the results
What it does
It is loaded with order data, search data, and customer data in order to produce heatmaps of SKU popularity, graphs of most popular terms, recommendations based on other customers searches, search clickthrough rate, real time fraud mapping, and anything else you can think of!!
How we built it
We started off with a simple elastic search cluster that was built out for the labs group, we built out some sample queries pulling from our orders, customers and loaded those in with kafka into logstash, and then into elastic.
We also wanted to prove out the pipeline to move data from our big data HDFS clusters into elastic and built out a system that could geo-tag search IP's, track the products they clicked on, build up rudimentary search analysis, and then loaded the data directly into elastic search using python, and the elastic hadoop plugin.
Challenges we ran into
Time series data is notoriously picky and we ran into that issue a lot. Because we were using customer click data, it was absolutely enormous and we had quite a time crunch on actually producing hive SQL that could run efficiently to load the index in time. Nick ran into a lot of issues configuring logstash to cooperate with his order queries directly from MSSQL.
Accomplishments that we're proud of
It is honestly one of the coolest projects I've been a part of at Wayfair. The amount of business intelligence we can extract from these graphs and data correlations are still coming in as we're writing this (we just found out we 'accidentally' built a recommendations engine too). The data still needs to be sanitized a bit more before we present to the judges, but we're hoping that they'll see the value in this, and will also be just as excited as we are to leverage elastic search in our analytics stack more at Wayfair.
What we learned
So many gotchas with elastic, hive, logstash, and kibana configurations... Because this is a fairly new concept for Wayfair, we had quite a learning curve to get through before we were able to actually load the data into elastic search, but thankfully there's a lot of very well written documentation about companies like ours who have also taken the plunge.
What's next for Project Forbidden Marriage
More data! More dashboards! More data sources!