Inspiration

When someone over radio speaks, it is almost always obvious what gender they are, based on their pitch, tone, pauses, etc. However, when someone writes, it's much less obvious. Are there actually differences in how male and female authors write?

What it does

It uses key words

How I built it

First, I used webscarping to gather data from various news sites, including The Guardian, 538, The Atlantic, and the Ne Yorker. Next, I analyzed phrasing, keywords, and parts of speech to find gender indicators. After I had the data, I had to analyze which machine learning algorithm yielded the best results -- gradient boosting, random forest, logistic regression, and SVMs.

Challenges I ran into

Finding the corpus to analyze was very challenging, as it could not be too diverse or too narrowly focused. In addition, it had to be large enough. Models also took an extensive amount of time to crossvalidate and create.

Accomplishments that I'm proud of

Being able to achieve 70% accuracy and being able to quantify languages into numbers and data.

What I learned

There is indeed a difference between male and female writing. Also, there it is possible to break down something as complicated as languages into number for machine learning analysis.

What's next for AuthorWiz

Maybe age and other demographics can also be predicted through the author's choice of words and phrasing.

Share this project:

Updates