Inspiration

We're two engineers on Product Media so we've had first hand experience working with image data for our products on site. Since images are often the primary connection between a customer and her purchase, it makes sense we'd want to show customers the best images we can. With all the data we can gather on our imagery, including how specific customers interact with it, it makes sense to use this as we customize her experience on Wayfair.

What it does

The idea would be to segment our customer base by her preferences towards certain types of images (bright, low contrast, saturated, etc.) and provide a customized browsing experience for her in which she would be served images that suit her preferences if available.

How we built it

First, we built SQL queries to match customer orders with lead image data for the products they purchased. We wrote scripts in Python, numpy, and Pillow to process the images and gather statistics such as saturation, contrast, white space percentage, and brightness. In Python, we compared the image data of individual customers to the average lead image statistics of products purchased to determine whether those customers were "outliers" for various image aesthetic categories.

Challenges we ran into

The data we ran into was very large and our computers often struggled to handle our queries. SQL proved very slow so lots of filtering was done in Python. We were often limited by the RAM of our machines and computational time for our data crunching algorithms.

Accomplishments that we're proud of

Optimizing various queries to perform tasks with our data in ways that circumnavigated our RAM and CPU limitations.

What we learned

There is a lot of image data out there that can be collected and analyzed! We also built our skills coding in languages and using libraries we don't typically work with on Product Media.

What's next for Customer Segmentation On Image Aesthetics

Clickstream data will allow us to apply our results to more of the customer base. In addition, we started to look at image resolution/compression as a statistic and we believe this could be explored.

Built With

Share this project:

Updates