Inspiration
When making decisions as consumers, whether it be deciding what to eat, what to watch, or something else entirely, many of us use review sites to weigh our options. However, an issue that our team often noticed was that the scale of ratings are not at all standardized; for example, a "4 star" may mean two entirely different things to two different reviewers. We believed the problem lies in that we rely on the reviewer to quantify their qualitative experiences; this quantification will differ heavily from person to person. Luckily, many reviewers choose to leave written reviews to describe their qualitative experiences beyond just the review score; thus, when we were introduced to co:here during CalHacks 9.0, we saw the perfect opportunity to try something new.
What it does
Rotten-Apples is a model that takes written reviews on a review site, and quantifies the review using the star system (or whatever system is used on that particular review site). Thus, it aims to standardize reviews to the "qualitative rating" of various products. We chose to use link, a forum and review site for anime, as a proof-of-concept that our idea could work for anime reviews. The name Rotten-Apples is a play on the review site "Rotten Tomatoes" combined with "Bad Apple," a very popular anime music video/song.
How we built it
Using mainly Python for general scripting, we first using the beautiful-soup library to scrape a large number of reviews off of MyAnimeList. We chose a list of 22 popular anime of various genres and eras, in the hopes of obtaining a large quantity and variety of reviews. Then, we used the co:here API to use their NLP model, fine-tuned it to our niche data-set, and used the trained model to generate a rating based on a qualitative review. We then used SQL to store our data, and used a combination of CSS, Svelte, Javascript, and Typescript to create a clean web UI for the project.
Challenges we ran into
One of the largest challenges we faced was that we initially planned to use Jikan, a RESTful API for MyAnimeList that we would've used to fetch the reviews. A few hours into the hackathon, however, we found that the API request to fetch reviews was nonfunctional. Thus, we were forced to pivot to using beautiful-soup4 to scrape the reviews directly off the website, which took longer than expected to complete. Part of the difficulty was our inexperience with webscraping, but a large part was also that the data needed to be cleaned to be most compatible with the co:here model. This cleaning process also took a good chunk of time.
The last major difficulty we faced was that we had to fine-tune our language model due to the very niche nature of our data set; in particular, reviews for anime often use a lot of community slang and jargon associated with anime. We found that the fine-tuning process was also a fairly time-intensive process (though it was interesting to see the progression of the model over time).
Accomplishments that we're proud of
We're very proud of having used co:here's NLP models to create an interesting project; NLP had always been a very interesting topic to us that we've never dove into, but we were able to use the co:here API to good use. We're also proud of our workflow through this hackathon as a whole. We were able to effectively divide the workload, and despite a variety of challenges along the way, we collaborated to work through them to reach an interesting end product.
What we learned
Through using the co:here API and attending the co:here workshop, we learned a lot more about what an NLP model is, how it works, and what it's useful for. Besides this, we also learned how to scrape data from a webpage, among other hackathon-specific skills.
What's next for Rotten-Apples
There is much to be done with Rotten-Apples moving forward, besides potentially coming up with a better name. Two obvious improvements would be firstly, to process more reviews from more anime using the model, and secondly, to add more reviews to the training data to get a better model. In particular, we are thinking of automating the process of adding reviews to the training set, which could be done through the web UI (for example, we could have a box where someone can submit a review with a rating, which gets fed directly into the training set).
In general, we could improve the sheer quantity of our analytics. We could begin to evaluate whether our newly generated ratings actually provide a better system of reviews than the existing model, which simply lets reviewers choose the ratings arbitrarily. Depending on whether we think our model is worth it or not, we could begin using it for recommendation systems for users in a larger, practical setting. Finally, the last step would be to generalize this project to expand beyond anime into other categories such as food, movies, etc.
Built With
- beautiful-soup
- co:here
- css
- csv
- javascript
- pandas
- python
- sql
- svelte
- typescript
Log in or sign up for Devpost to join the conversation.