Inspiration
Every day, millions of people express themselves by writing in social media (e.g., Facebook, Twitter, and blogs). Through simple text messages, people freely share their thoughts and emotions with their circle of friends, larger group of acquaintances, or even the entire online world. The written language accumulating in social media is a massive source of rich psychological data with unrealized scientific potential. If we can translate this language into novel measurement methods, they stand to substantially increase the scale and scope of psychological research.
What it does
Input to the machine is any kind of online text message from social media and predicts the age and gender just based on that, no Prior knowledge involved.
How I built it
DataBase: Data is taken from (http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm) The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words - or approximately 35 posts and 7250 words per person
Algorithm: NLP, Open Vocabulary Analysis , Tokenization (n-gram), LDA (mallet), Facebook topics, PointWise mutual Information, Correlation Analysis, WordCloud (Visualization)
Technology: NLP, Python, Flask, MySQL, Javascript, HTML, CSS
Challenges I ran into
It took us a while to figure out how to host our NLP Model as a service on a web application, and finding the right feature set for predicting the age and gender
Accomplishments that I'm proud of
In a short span of 48 hours we were able to gather the first hand research and build an end to end Machine for predicting age and gender. I am very happy to have built a robust training feature set ,built a training model with fair accuracy and built a web application to provide real time service to the end user for unearthing psychological richness in language without any prior knowledge.
What I learned
we provided evidence that the language in social media can be harnessed to create a valid and reliable measure of personality. This approach is just one example of how social media can extend assessment to many more people—quickly, cheaply, and with low participant burden. Moreover, this illustrates how computational techniques can reveal new layers of psychological richness in language.
What's next for Personality Assessment Through Social Media Language
Using these techniques to study the words and phrases through which people express themselves, as well as their change over time, may provide us with a clearer portrait of their unfolding mental life.



Log in or sign up for Devpost to join the conversation.