GreatUniHack 2017 Hack
A program which helps TheHutGroup customer service agents by categorizing and prioritizing customer emails. It has a machine learning based algorithm which is trained on past emails and can categorize new emails in 10 categories, and give them priority. Their priority is based on the harshness of the language used by the customer.
We use tf-idf to obtain a vector of frequencies for each word in the set of emails, and combine them with the sentiment of the message, to obtain the feature vector for each email. The feature vectors of the training emails are then passed to a SVM model. To calculate tf-idf we use scikit-learn, and to find the sentiment of a message we use the Google Cloud Language API.
A python client fetches new emails for gmail, uses the algorithm to categorize them, and sends the emails labeled with their category back to the email. The program checks the emails for typos and corrects them so that the algorithm can use the intended semantics.