Doxxit

Inspiration

This application was created because while browsing Reddit (instead of thinking for an idea for this hackathon), we were shocked at how much personal information users make public

What it does

The user is allowed to enter his or her Reddit username (if one exists) or the username of any user to find out all sorts of information about that user. It will display a word cloud based on how frequent a word or phrase is used. The user’s feelings toward the topics of his or her comments as well as any other personal information that can be obtained is displayed.

How we built it

Doxxit allows the user to enter the username of any Reddit user, and--after loading briefly--allows him or her to see how the user "feels" about certain topics commented on. This is all done first by using a web scraper to scrape the contents off all the entered user’s comments (if a valid username was entered). Each comment found on the user's profile is entered into a comma-separated values (CSV) file which acts as a rudimentary database. For each comment in the CSV, words are parsed and the frequencies of each are calculated for later use. A natural language processor (NLP) API is used in order to judge the meaning of sentences. This NLP is also used to find as much information about the user as possible.

Challenges we ran into

We originally tried to use an SQL database, but connecting it to our python script was very troublesome. We switched to csv files for simplicity. We also needed to count the frequency of all the words in the user's comments. Our team member originally wrote a Java script to do this. However, to unify the code better, we had to rewrite it in Python. We also had disagreements about how to implement the front end. We had arguments for a Node.js online implementation or the offline GUI we currently have.

Accomplishments that we're proud of

Most of our team comes from Java background. However, we learned a lot of Python and finished our hack in a language we had little experience with.

What we learned

We learned Python more than anything else. This was a result of wanting to use Java but having a natural language processor in Python. Because of this, we had to compromise and make the whole program—even the GUI—in Python. We also learned about natural language processing and how to utilize the NLTK library.

What's next for Doxxit

We hope to continue completing Doxxit, as we were unable to implement a word cloud to represent the frequencies of words appearing in comments: this function as well as making an executable version of the program were unable to be finished.

Built With

nltk
praw
python
tkinter

Submitted to

HackGT 2016
- Winner Collector's Edition of Watch Dogs 2

Created by

I came up with the idea originally and led the development. I created the Reddit scraper and GUI.

Rishi Raj
I worked on calculating word frequencies in comments and displaying those on the GUI. I also worked on the SQL database/CSV file system of keeping track of the data.

Nikola Istvanic
I worked on the word cloud and the word analysis using python and pyspark thunder to visualize that data and predict the sentiment of the user.

Dharshan Rammohan