The purpose of this project (educational track) is for educators to submit their students assignments, written by hand for plagiarism detection. The current cheating detection system online only works with documents written by an application. We were inspired by the Google Classroom plagiarism detection which then we realized only worked with google docs. Another issue with this is it only compares the student's text to billions of pages on the internet but never against other submissions in the class.
What it does
The plagiarism works by taking two png/jpeg images and using textual conversion machine learning library to convert them to digital text. The user is returned with the parsed text and two percentage representing the similarity and uniqueness respectively.
How we built it
We built it using the following python packages:
- Bootstrap CSS
- Tesseract and OpenCV
Challenges we ran into
We ran into multiple challenges of finding the right settings for Tesseract to parse the image. This included removing noise from the image, or converting it to a greyscale image. After multiple hours of work and research we decided on the optimal conditions of how Tesseract should work in this website.
Accomplishments that we're proud of
We are proud to get a working website out, filled with the features our idea had encompassed. We also successfully used two machine learning libraries and solved the challenge listed above.
What we learned
We learned more about web server development, such as using the Django Web Framework. We also absorbed knowledge about HTML scripts and passing backend data over to frontend. Furthermore, since our project consists of Machine Learning, we learned about the two libraries (Tesseract and PySimilar).
What's next for Evlav Detection
More possible solutions this project could turn into is a fully functional website with teacher and student login. We also planned for a UI where a teacher could submit N student's work and the detector would cross reference each student and return a comprehensive report to the teacher. One possible improvement/optimization for Evlav Detection is improving pytesseract or furthermore using Google's OCR API for precise image to text analysis.