Demonstration is an important part of teaching people how to program for the first time. In a lecture environment, there are two ways of demonstrating - either by typing code on a computer or by handwriting code on a board. Typing code ties the lecturer to their laptop. This is problematic because it reduces lecturer mobility, reduces eye contact and engagement with the class, and makes it difficult and unnatural for the lecturer to point out things in the code. Handwriting code on the board solves all of these problems. In a classroom, the professors teach their students by writing programs on the white board, but the credibility of these programs cannot be determined as there is no way to compile and execute them to show that they actually work without giving any errors. We want to create a system which will convert this handwritten code to compilable code with the help of an OCR.
What it does
This system will provide a service to the users, which will enable them to convert their handwritten source code to a compilable source code, which can be understood by the compiler/interpreter. By using this service, users do not have to type the code in the computer, which they might have already written on paper.
How I built it
First I preprocessed the handwritten text in Matlab, removing all the noise and unwanted character. Sent this preprocessed text to the Tesseract OCR to convert into digital text. And performed post processing on it in python adding all the necessary fields.
Challenges I ran into
Getting the Tesseract OCR to work seamlessly was a task. Training the OCR took a lot of efforts as it's accuracy was really poor at the start.
Accomplishments that I'm proud of
Got it working eventually. Tesseract accuracy was 83% after the training.
What I learned
How to use Tesseract OCR and how to code in Matlab
What's next for SmartCode
1) Increasing the number of programs support. The scope of SmartCode is limited to C programming language and only inbuilt functions. So probably increase the support for more languages.
2) Caching of correct OCR results Due to varying lighting conditions, OCR engine may recognize different text each run. It is possible to use a feature detector like SURF or SIFT to detect already OCRed text from image to image, potentially increasing accuracy and OCR speed.