
Figure 4: Average Profit Over Time Based on Simulations

Figure 1: Current Pricing Framework

Figure 2: Our Reinforcement Learning Framework

Figure 3: Our Profit Distribution over Optimal Parameters
Project Relasta Documentation
Links
User Story
As an employee of the pricing team
I want an AI decisionmaker
So that I can adjust product prices for realtime demand.
Executive Summary
The current pricing system is based on pricequantity relationship estimations and algorithms that use features important to product demand (e.g. competitor prices, product similarity, etc). This pricing system is slow to adjust to recent demand due to delays in estimation and does not scale due to the need to build an algorithm for every feature. We propose a decisionmaker that uses artificial intelligence and reinforcement learning to adjust product prices for realtime demand and determine which features are important for pricing a product. During engineering day, we developed an environment where we can train a machine to produce optimal prices in a nearrealistic economy and integrated our training framework into the pricing team’s current services. Our simulations suggests that our machine is learning profitable pricing strategies at a rapid rate. In the future, we hope to provide a more robust learning framework for our machine to make even better decisions and we plan to test our learner in the actual economy with a pricing experiment.
Full Documentation
Definitions
Due to the rather dense vocabulary present on the Pricing & Profitability Team, we felt that we should introduce some terms in order to streamline the understanding of this project.
Elasticity: This is a number that defines the theoretical relationship between price and quantity demanded for a product. For example, an elasticity of 2 means that a 1% decrease in price is predicted to, in theory, increase quantity demanded for a product by 2%. Once we know the elasticity of a good, we can write a formula that estimates the profitoptimal price if elasticity is correct.
PLP: This is a number that is the proportion of profitoptimal price a product is at given the elasticity of a product. For example, if PLP is .8, then the product is 20% below profitoptimal price according to elasticity. The point of PLP is that we do not always have the correct elasticity since there are unpredictable variations in demand. This warrants an adjustment of PLP to a number other than 1.
RPI: This is a number that stands for Relative Price Index. The relative price index is a value that represents where our product prices are relative to competitor prices. For example, if RPI is .2, This means we are on average 20% above the price of our competitors.
Current Pricing System
Figure 1: A summary of our current pricing system.
The current pricing framework is owned by the Pricing & Profitability Team and is developed by the Pricing Algorithms subgroup and our partner QuantCo. This framework is named Elasta, and it takes the following steps to produce prices:
We take in a set of product inputs relevant for identifying the relationship between price and quantity demanded. These inputs can include cost information, revenue adjustment ratios, and historical data on prices and demand. We then use a model to predict the relationship between price and quantity demanded for a product. This relationship is summarized with a number called the Elasticity of a product. Once we know the Elasticity, we can use a program to find the theoretical optimal price assuming the Elasticity is correct. Based on our definitions, this is the price at PLP of 1.
Because we do not always get Elasticity correct due to unexplained variations in demand, we use a set of feature algorithms to make adjustments to the PLP of a product. These feature algorithms are based on variables that were not included in the estimation of Elasticity. For example, the algorithm riptide adjusts the PLP of a product based on competitor price and a particular target RPI.
Once we have made our PLP adjustments, we then get our predicted optimal price which is then sent to the website.
This framework is summarized in Figure 1. Our pricing system has been extended to certain brand catalogs such as Wayfair.com, Wayfair Canada, Joss & Main, and Birch Lane.
There are two key flaws to the current pricing framework:
The model for estimating Elasticity is retuned every 3 months. This is a relatively slow adjustment, since there can be severe demand variation each three months that could be unpredictable from the model. These demand shocks could include natural disasters, unexpected competitor price moves, or surprise macroeconomic trends.
Our feature algorithm development does not scale to the growing number of features that are relevant to pricing. The current system requires us to build a new algorithm for every feature we find important. When the number of relevant factors to product pricing scales to the thousands, it’s not sustainable to build a new algorithm for every single factor.
We will address these two flaws with our artificial intelligence (AI) solution.
Our Proposed Framework
Figure 2: A summary of the reinforcement learning framework for a given product.
Our solution is a machine that learns to price products using a technique called reinforcement learning. Reinforcement learning is a strategy that has a decisionmaker learning to make optimal decisions to reach a particular goal. By engineering reinforcement learning into Elasta, we felt it was reasonable to name the project Relasta.
Reinforcement learning is based on an interface between a decisionmaker and its environment. The decisionmakers performs an action which has some effect on the environment around it. Given this action, the environment chooses to send back a reward and a state that signals the situation that the decisionmaker is in. The goal of decisionmaker is to maximize this reward. Due to this goal, the decisionmaker makes an informed action based on the state they are in for the purpose of maximizing the reward. The environment then uses this informed action to send back another reward and state signal, and the process loops forever. The “reinforcement” part is the fact that the decisionmaker receives feedback (via a state and reward) for every action they make. The “learning” part is the fact that the decisionmaker learns to reach its goal by discovering the importance of factors in their current state and then making optimal actions based on these factors.
Our pricing framework can fit this interface. Say we are looking to price a given product.
The environment in this context is the consumer economy based on this product. The consumer economy generates some demand for this product over a given time period based on many factors (e.g. competitor price, seasonality, consumer confidence). Over this time period, the economy generates profit for the product. Profit can be considered the reward in this context, since as a company we are looking to maximize profit generated. The consumer economy also sends back a state signal, which includes the factors that could be relevant to how profit was generated for the product. These factors could include quantity demanded over the time period for the product, competitor price, seasonality, and macroeconomic trends.
Our AI (the decisionmaker) sees the profit (the reward) generated for the product over a given time period and a set of features (the state) that may have been important for creating this profit. Given profit, our AI learns to weight the importance of the set of state features it sees. For example, perhaps competitor price is more important to profit generation than the current month. Our AI in this context should learn to weight competitor price more than the current month when making a decision.
Then, given this weighting of features and the features themselves, our AI makes an informed price move on the product (action) with the goal of maximizing profit in the next time period. This action is then seen by the consumer economy, which will then restart the process of generating profit and factors relevant to profit in the next time period. This interaction between profit and price moves will continue until the business chooses to halt the process.
We summarize this process for a given product in Figure 2. Our reinforcement learning technique addresses the current pricing system’s flaws in the following ways:
The time period for demand generation in arbitrary. Thus, our AI can learn to make price moves over days rather than waiting 3 months to estimate optimal price moves. This means that our AI can react to realtime information about profit and quantity demanded for products given the price moves it has made. In this sense, our AI can react to variations in demand much more quickly than our Elasticity estimator.
The amount of factors in the state signal is arbitrary. Thus, our AI can learn the importance of as many profit factors as it wants. Thus, this scales much better to the number of pricing features than our current system because we don’t need to write a new feature algorithm for every feature we deem important for pricing. Instead we can just add the feature to the state signal and our AI will learn over time how to price based on this feature.
Having conceptualized a solution to these two issues, we then implemented our new system during Engineering Day.
Our Work on Engineering Day
We realized that from a business perspective, our AI shouldn’t be unleashed immediately in our current pricing framework. Since our AI needs to learn how to make good decisions, we cannot be sure that it will immediately make good decisions in the real world without any previous experience. Thus, we decided to create a simulated environment for our AI to learn on before interacting with the actual economy.
This simulated environment is currently designed to be as realistic as possible in order to prepare our AI to make decisions in the actual environment. This meant creating an environment that generated demand and profit for products based on historical data and real parameters estimated by QuantCo and our Pricing Team. A few of these parameters are based on seasonality estimates, product sort behavior, competitor price information, and surprise demand shocks. Using this approximation of the real world, the simulated environment provides an “obstacle course” for our AI to learn good decisionmaking over a virtual number of days.
We originally prototyped our AI and our simulated environment using the Python programming language. We chose this language because of its readability and interpretability throughout the developer community. This will allow our codebase to be easily portable to any other languages or tools at Wayfair. This will ensure that our current solution can be easily integrated for deployment at Wayfair.
We then chose to transfer our prototype to the .NET programming framework. This meant converting our codebase from Python to C# and F#. We did this because almost all of the Pricing Team’s current services are implemented in the .NET framework, and so this would allow our AI and our simulated environment to be quickly integrated within Pricing & Profitability.
Model Performance
To test the performance of our simulation, we simulated our AIenvironment interaction over 90 days with about 200 products. We chose 90 days because this is a benchmark set by pricing managers to prove that a pricing strategy works. These 200 products account for 1% of US revenue generated in the last 12 months, and thus they represent a meaningful proportion of revenue on the Wayfair.com catalog. We initially ran thousands of simulations to find the optimal parameters for our AI to use. To determine which parameters were optimal, we checked which parameters allowed our AI to produce the most profit over these 90 virtual days. Once we found our optimal parameters, we then ran 2000 simulations fixed to these particular optimal parameters. We present our results for these 2000 simulations below.
Figure 3: The distribution of profit generated over 90 days under optimal parameters. This distribution is generated over 2000 simulations.
Figure 3 shows our distribution of profit generated over the 2000 optimalparameter simulations. We see that on average, our AI makes $1.6M in profit over the premise of 90 days. We would argue that this is a nontrivial sum of money for the business to make over 3 months. We also note that the standard deviation for this distribution is around $7K, which means that most of our simulations generate between $1.593M and $1.607M over the premise of 90 days. Since this standard deviation is small relative to the mean profit of our distribution, we would argue that our AI is relatively stable in its decisionmaking because it makes around the same profit during each simulation.
Figure 4: Average Profit over time. These profits are averaged over 2000 simulations.
Figure 4 displays average profit over time in simulation. We see that on average, our AI is generating more profit as the days progress in our simulation. This suggests that our AI is learning to make better decisions (i.e. better price moves) as time goes on. We see that by day 60 in our simulations, we are on average producing almost $2K more in daily profit than when our simulations started. This suggests that we are making sizable gains in profit in as short of a time as two months.
Next Steps
We have several avenues of future work:
We can add more features to state signal of our simulated environment. In particular, it may be useful to consider how macroeconomic trends fit into the broader picture of the economy that Wayfair interacts with.
We can expand our AI’s decisions to cover a larger portion of the US catalog. Since our current results only account for 1% of revenue generated, we have room to grow if we want this AI to have more impact on Wayfair’s pricing strategy as a whole.
In a few months, we hope to deploy Relasta in the actual economic environment via a pricing experiment. We hope that our algorithm has learned enough in the simulated environment to make optimal price moves in the real world.
Log in or sign up for Devpost to join the conversation.