As a Goldman Sachs employee, I would like to propose the best solution to my client regarding all the factors revolving around opening a restaurant involving tacos and burritos.
We input the critical data points to arrive at various deciding factors such as: -
- Which city to choose as a business hotspot.
- Which month and day to launch a new product.
- Which day in the week the restaurant has the highest virtual traffic.
- Which ingredients to incorporate in the new product to enhance profits/sales.
- Which city is most profitable for a price selected.
Dealing with missing/noisy data: -
- Only 50% of the data was given for columns “amount.max” and “amount.min”. We tried to generate the rest 50% of the values using the Random Forest Method and faced few problems at the end of execution. So, we have considered the 50% of the data to calculate average prices of tacos and burritos.
- The column “menupageURL” was a good trigger point to identify third party traffic but 88% data was missing. So, we chose another column “date. seen” from which we were able to generate value predictions.
- We have observed an abnormal increase in a few tacos/burrito prices. We have taken the average price of tacos/burrito with outliers and without outliers and the average price was 8.2 and 8.7 respectively. As the difference was not significant, we went with 8.2$ as an average cost.
- The price range (min and max) for 5$(taco/burrito) and 499$(taco/burrito) was “0-25”. From this, we have understood that price range values were noisy and hence were not used for data analysis/interpretation.
- The “menu description” column has ingredients (high unique strings observed) and it was difficult because every restaurant had its own product terminology. We have researched and identified “20 most popular ingredients in tacos”.
Questions asked by the team: -
How do you derive insights, statistics, and meaning which support your project's story? • We have prepared our story in Tableau. We will explain to you in-depth.
Which methods do you plan on using while mining, manipulating, and enhancing the data? • Pandas • Numpy • Matplotlib • Sklearn • Excel filters and data separation techniques • Tableau
What will the final product look like? • Our model helps to understand “location preferences” of opening a taco/burrito restaurant.
How well does your model handle new datasets and evolve to encourage future use? • Location preferences: Top 10 locations to open your taco/burrito place. • Seasonality: Which month observes the highest degree in virtual search of taco/burrito. • Trend: How is the increase/decrease of taco/burrito over the past few years. • Location-based pricing: The manager of the food chain can understand where to open in order to earn “x” profits. • Ingredients: The taco/burrito’s that were highly searched have repeated ingredients. All the ingredients were counted to launch a new product with similar ingredients to likely increase revenue.