- Currently, because of COVID-19, it is extremely difficult for many restaurants to stay afloat
- After the pandemic, local restaurants will still be recovering from the financial burdens of COVID-19
- One of our team members’ mother runs a small business at home and has seen first hand how COVID-19 has had a negative impact on business, leading us to have a personal connection to how these groups are being affected
What it does
food4thought is an immediate response to the pandemic, and a foreshadow to the aftermath of this pandemic. Our online platform is a restaurant recommender system that filters through user reviews to give you the best dining experience while supporting small businesses in Houston. Through a simple form, we gather the user’s preferences and location information, and our algorithm will output a series of relevant recommendations of what local restaurants to try!
How we built it
- We acquired datasets containing large amounts of reviews from Google Maps. Because the original datasets contained a wide variety of information on businesses all over the United States, we pre-processed the data into a bucket of 2,000 reviews solely for Houston businesses.
- With these trimmed datasets, we used a latent factor collaborative filtering strategy to narrow down results by the requested search term. Latent factor collaborative filtering, as opposed to content-based filtering, seeks to match the search term with the text contents of many user reviews rather than the restaurant properties directly.
- After using TF-IDF vectorization to isolate the most relevant words in the text content, we grouped the vectors by submitted user and business ID into two matrices that can be multiplied together to estimate the rating given by our user. We return the top five recommendations based on this rating and use the Google Places API to get the more detailed information that we display.
- Because we use LFCF, our yielded recommendations are based directly on the text of previous user reviews. For example, if we input the search term "date night," our algorithm will compare the string directly to the vector of the most important words for each review. This means that our recommendation engine is not only effective on simple, direct queries like "chinese food" or "tacos," it can also handle much vaguer terms like "date night" with decent accuracy. This is illustrated when we search "date night." The top two results are Fratelli's Ristorante and Mai's Restaurant, both of which are higher-end locations that are perfect for a date.
Challenges we ran into
Working with that large of a dataset was something that none of us had experience with. The aggregate review dataset was a 5GB JSON file that couldn't be loaded directly by pandas or even displayed in VSCode. To get around this, we simply used streams to facilitate I/O - reading the dataset one line at a time. This of course made pre-processing our datasets much slower than anticipated.
Implementing the recommendation system was also a challenge. Once again, none of us had experience with writing even simple ML recommenders, so writing the recommendation engine from scratch was difficult from a cold start. We also had some trouble with the initial implementation of gradient descent to increase the effectiveness of our model. We knew conceptually that gradient descent was supposed to minimize the error between the estimated and actual rating, although we struggled with the algorithmic solution.
Accomplishments that we're proud of
- We were able to create not only a recommender system but also a website with a simple yet effective UI for users in under 24 hours!
- Processing the text content of the review, using a TF-IDF Vectorizer to identify the most important remaining words, and accounting for all types of inputs for our recommender system!
- Learning more about data science, especially latent factor collaborative filtering!
What we learned
- How to work with extremely large datasets, and process them for our specific use case
- How to build a recommender system
What's next for food4thought
Technically, we would like to use Gradient Descent to calculate and minimize error. We would also like to expand our platform past Houston, and make this an accessible tool for users nationwide and maybe internationally. Lastly, we hope to include local businesses, and not just limit to restaurants!