This application was created because while browsing Reddit (instead of thinking for an idea for this hackathon), we were shocked at how much personal information users make public
What it does
The user is allowed to enter his or her Reddit username (if one exists) or the username of any user to find out all sorts of information about that user. It will display a word cloud based on how frequent a word or phrase is used. The user’s feelings toward the topics of his or her comments as well as any other personal information that can be obtained is displayed.
How we built it
Doxxit allows the user to enter the username of any Reddit user, and--after loading briefly--allows him or her to see how the user "feels" about certain topics commented on. This is all done first by using a web scraper to scrape the contents off all the entered user’s comments (if a valid username was entered). Each comment found on the user's profile is entered into a comma-separated values (CSV) file which acts as a rudimentary database. For each comment in the CSV, words are parsed and the frequencies of each are calculated for later use. A natural language processor (NLP) API is used in order to judge the meaning of sentences. This NLP is also used to find as much information about the user as possible.
Challenges we ran into
We originally tried to use an SQL database, but connecting it to our python script was very troublesome. We switched to csv files for simplicity. We also needed to count the frequency of all the words in the user's comments. Our team member originally wrote a Java script to do this. However, to unify the code better, we had to rewrite it in Python. We also had disagreements about how to implement the front end. We had arguments for a Node.js online implementation or the offline GUI we currently have.
Accomplishments that we're proud of
Most of our team comes from Java background. However, we learned a lot of Python and finished our hack in a language we had little experience with.
What we learned
We learned Python more than anything else. This was a result of wanting to use Java but having a natural language processor in Python. Because of this, we had to compromise and make the whole program—even the GUI—in Python. We also learned about natural language processing and how to utilize the NLTK library.
What's next for Doxxit
We hope to continue completing Doxxit, as we were unable to implement a word cloud to represent the frequencies of words appearing in comments: this function as well as making an executable version of the program were unable to be finished.