We noticed patterns in other professional contexts where women seemed to have been apologizing more in conversation. This piqued our curiosity and inspired us to take a closer look at if there were trends and data on this.
What it does
Our project analyzes the Enron email corpus using a variety of natural language processing techniques to generate features of the email body. We also used gender_guesser to identify the likely gender of email senders based on their names. From this, we fed the resulting dataframe into a ML model to train it to identify the gender based on the features.
How we built it
We used several Jupyter notebooks to clean the data and prepare it for analysis. We then used nltk to identify features in the email bodies, such as the amount of "apologetic" words,