Comment Analyzer

Inspiration

Subreddits get banned, but the people on them migrate to other subreddits. Our idea was to create a data model and web app to identify harassment and find subreddits that have similar comments, possibly recognizing similarity between different subreddits

What it does

Presents ratings created by a data model based on comments, displays harassment likelihood with the grid application grouping similar communities

How we built it

The data model is created in R, the comments are stored in DynamoDB, and the web app is built using python and flask, the data visualization uses d3.js

Challenges we ran into

There are very many reddit comments every month, and processing all of them and moving them to DynamoDB would take a long time. We could've provisioned more servers to process it, but none of us could use the AWS credit provided. It would've been much better to use only one machine so the data didn't have to be moved to and from a database for processing.

Accomplishments that we're proud of

The ratings table and map done in d3.js look good Concept seems good and feasible with more time

What we learned

Knowledge of what AWS is is good, but experience with and smart use of it is better Learned how to use flask as a framework Boto3 doesn't scan all of dynamodb Large datasets are tough to process Got some AWS experience