Penn Course Review is a great resource which assists course selection. It provides ratings of many courses for students to compare.However, Penn Course Review misses students' comments beyond numbers. Comments are typically more elaborated and therefore provide more detailed information.
What it does
The purpose of the Review Hub project is to create a course review system that aggregate students' comments from different sources and present the comments to students through a consistent and beautiful interface. By using the system, students save the manual effort of searching information from various sources, filtering out the relevant comments, and repeating the process every semester.
How I built it
Backend: Collect student's course comment data from various data source, pass it through NLP processor (tokenize, remove stop words, lemmatize, part-of-speech analysis, and sort words based on frequency), and store it in database. Develops api for frontend to fetch the list of all courses, word occurrence and comment info of a specific course. Frontend: Support course search with auto filter features. Display word cloud with word occurrence info and anyChart library, and a list of comments. Provide links to redirected to Penn Course Review and seas.upenn.edu for gathering information.
React and Node.js are used to construct the website, which is hosted on Amazon EC2. MongoDB for stored the comment data NLTK library for parsing comments;
Challenges and next steps for Review Hub
The challenge we face is to integrate real-word data of diverse format into a uniform format that works with the existing pipeline. For example, Facebook, Slack, and RateMyProfessor data are in different format.Therefore, we need to standardize data format at intermediate steps. This means that all it takes to integrate a new data source is to build a source-specific crawler. This also provides the benefit that data of various format can enter the pipeline at different stages.