We came in wanting to do something none of us had worked on before and each of has interest in machine learning. Inititially we planned on doing a simple plaigerism check program but as we researched available literature we found an author attribution system more interesting to make.

What it does

The program takes a text document, currently pdfs or docx uses over 50 metrics to identify the style of a user's writing. These metrics are then fed into a support vector machine which compares them to an unknown test data. The machine then predicts from available authors which author is the best fit for the document. This is usueful in measuring the progress and similarity of student writing as well as identifying unknown sources of writing from different time periods

How we built it

 PyQt5 was used for the GUI, NLTK was used for the support vector machine, and python was used as the underlying language

Challenges we ran into

 Determining which characteristics are useful for attribution is not easy and there is a lot of conflicting ideas on which aspects of writing yield the most accurate prediction. It is also our first larger python project and none of us are very familiar with some aspects of the language that came up in the project's libraries

Accomplishments that we're proud of

 The support vector machine using our metrics is able to correctly identify our samples 90% of the time.

What we learned

What's next for Checkr

 If we continue work on this project we would make a web portal for users and begin building a much larger database of samples.
Share this project: