What inspired your project? When we began discussing what we wanted to investigate in our work our discussion started to get into computational linguistics. We first started by discussing Zipf's law and how it appears in many texts. From there, we decided we wanted to do a project that somehow integrated our groups combination of CS and humanities lovers into a project both fields would find interesting. That enigma led to us making an algorithm and subsequent program that can determine the “uniqueness” in the word choice of a given text.
What problem does your project solve? This project helps students tailor their writing to a specific audience. For instance, for creative writing or poetry they might aim for a higher uniqueness score, while writing for a younger or casual audience necessitates less unique word choice.
How does your project solve the problem? This project evaluates the language used in a piece of writing by comparing the language of the writing to a database containing English words and their frequency of use, in order to determine how basic or creative the language is.
What technologies did your project use? Include programming languages, libraries, and any external tools. The foundation for our program was based in Python. We used the Pandas dataframe, the Statistics module, and the NLTK library. We also used online resources to assist in making the binary search function.
What challenges did you run into? The binary search was difficult to figure out because we were searching for Strings rather than numerical values. Thankfully you can use math operators for lexicographic comparisons in a similar way to numeric ones. With articles being so commonly used in the English language, they would often influence the uniqueness score causing inaccurate or inconsistent results. We had to find a way to disregard such commonly used English words to make the score more fair. To do this, we removed all “stopwords” provided by the NLTK Python library from the inputted text.
Log in or sign up for Devpost to join the conversation.