Inspiration

I moderate r/utdallas, a student-run subreddit with over 20,000 subscribers. On r/utdallas, students ask questions. However, sometimes those questions and concerns feel like they're thrown into the void, and it can be unclear to university administrators what issues tend to be important to students. Furthermore, my university, like other universities, has a student government that sometimes struggles to find the best ways of supporting students and identifying projects and initiatives to improve the student community.

Why not source that data from the one online forum where students can anonymously raise their concerns?

What it does

Horizon tries to answer the question, "What's important to students at any given point in time?" Inspired by Google Trends, Horizon takes a data-driven approach to finding out what students are talking about and when they talk about those topics. This tool uses machine learning to aggregate information across posts on r/utdallas into a timeline fashion.

Users are also able to browse posts by topics such as "professors" or "ECS advising." Horizon provides charts that show users when certain topics have been popular or are trending.

How I built it

The client application for Horizon was built using Next.js because of its simple routing and deployed on Vercel because of its easy-to-use continuous deployment.

With the exception of a few components from the MUI library, components in this web app were custom-built using Tailwind CSS.

To recognize entities, I implemented a custom pipeline using the Google Cloud Natural Language API and SpaCy, a well-known library for natural language processing.

Challenges I ran into and what I learned

I underestimated how difficult entity parsing would be. It is simple for a human to connect language to its semantic meaning, but many different variations of words or phrases can map to a single idea, such as "professor," "professors," and "instructors." While the Natural Language API provided a basis for key functionality during prototyping, I learned that it alone was inadequate for extracting many entities from text. Fortunately, I was able to come up with a stop-gap solution that allowed the provided the basis for organizing entities.

Accomplishments that I'm proud of

I'm proud of designing and building a relatively complex UI put together in less than 12 hours. Sure, it needs some refinement, but I'm satisfied at how quickly Tailwind CSS let me prototype this project.

What's next for Horizon - Collecting Comets' Concerns

This tool could easily be expanded to support more visualizations of students concerns on Reddit. It can also benefit from using natural language understanding techniques that take into account existing knowledge graphs. In the future, I'd like to build out the set of topics that this tool has in its database to better visualize the knowledge graph implicit in discussions.

Share this project:

Updates