When someone over radio speaks, it is almost always obvious what gender they are, based on their pitch, tone, pauses, etc. However, when someone writes, it's much less obvious. Are there actually differences in how male and female authors write?
What it does
It uses key words
How I built it
First, I used webscarping to gather data from various news sites, including The Guardian, 538, The Atlantic, and the Ne Yorker. Next, I analyzed phrasing, keywords, and parts of speech to find gender indicators. After I had the data, I had to analyze which machine learning algorithm yielded the best results -- gradient boosting, random forest, logistic regression, and SVMs.
Challenges I ran into
Finding the corpus to analyze was very challenging, as it could not be too diverse or too narrowly focused. In addition, it had to be large enough. Models also took an extensive amount of time to crossvalidate and create.
Accomplishments that I'm proud of
Being able to achieve 70% accuracy and being able to quantify languages into numbers and data.
What I learned
There is indeed a difference between male and female writing. Also, there it is possible to break down something as complicated as languages into number for machine learning analysis.
What's next for AuthorWiz
Maybe age and other demographics can also be predicted through the author's choice of words and phrasing.
Log in or sign up for Devpost to join the conversation.