• We've always been curious about how data mining can be used exactly to derive meaningful results, so we decided to build an application that performs analysis on data gathered from reddit

What it does

  • It extracts topics of reddit submissions of a specific subreddit forum within a specific period of time
  • Then, it performs frequency distribution analysis on the keywords extracted
  • Finally, it displays the frequency distribution of top 50 keywords in the form of a word cloud

How I built it

  • frontend: html,js, ZingChart
  • backend: flask, PSAW, nltk

Challenges I ran into

  • We were new to data mining, a lot of time has been spent on researching for the suitable resources for us to begin with.
  • the efficiency of data extraction process is heavily dependent on the popularity of the subreddit forum in search

Accomplishments that I'm proud of

  • Able to identify top keywords for any subreddit over a period of time

What I learned

  • Basic NLP techniques
  • Extracting large sets of data

What's next for RedditSays

  • Visualise the movement of trends across time
  • Plot geospatial heatmaps showing which parts of the world (roughly) discuss certain topics
  • Add more features to make frontend interactive
Share this project: