Inspiration
As avid enjoyers of good food and gatekept, niche restaurants, we’re tired of being recommended the same restaurants over and over again. That’s why we created Datadash, an algorithm meant to help you find new hidden gems using Yelp’s Open Dataset.
What it does and How we built it
Using the Yelp dataset filtered to California restaurants, we built a pipeline to identify “hidden gem” restaurants by separating food quality from popularity. First, we used the Gemini API with few-shot prompting to run sentiment analysis on Yelp reviews and extract structured signals about food quality. We then aggregated these signals into a Gem Score, which captures consistent, high-quality dining experiences using metrics such as average sentiment, consistency, trends over time, and usefulness-weighted reviews. Next, we constructed a separate Popularity Score based on engagement and visibility metrics, including review volume and the time distribution of reviews. By keeping quality and popularity as independent normalized scores (0–1), we were able to plot restaurants in a 2-D space and apply statistical thresholding to cluster and identify restaurants that have high quality but relatively low popularity—our definition of hidden gems.
Challenges we ran into
The Yelp reviews dataset is incredibly large, containing over 190k lines of data on user reviews. Thus, processing this data took an unexpectedly long time as our computers did not have the compute power to process such a large dataset at once, so we had to break our calculations into batches.
What's next for DataDash
For the future, we hope to apply our algorithm to all cities and states to find and recommend hidden gem restaurants all across the world, not just in California.
Built With
- jupyter
- python
Log in or sign up for Devpost to join the conversation.