Get-in-Line Image Parsing & Text Extraction

Inspiration

Extracting text from complicated images, and performing complex operations motivated me to work on this project. It amazes me how systems can extract text from a picture of different file formats, number plates from vehicles for traffic regulation checks, etc is developed

What it does

Get-in-Line extracts the narration of a conversation depicted in comics and processes the image to extract meaningful text preserving the order of sequence in a dialogue format

How I built it

I used Python tesseract which is an Optical Character Recognition (OCR) tool and built a custom config to improve the accuracy using Page Segmentation Mode (PSM) & default engine mode (OEM). I iterated through the comic images available and defined the logic for gray scale, and noise removal to smoothen the image. I also used OpenCV (cv2) library for image processing. Once the image completed the initial processing & cleaning step, details were extracted using pytesseract with the help of custom configuration, pulling text in 'English'. I maintained a Python list to insert the text extractions serially while defining the logic to split conversation lines/ dialogue in order to preserve the narration in sequence

Challenges I ran into

Initially, I tried understanding the problem statement and breaking it up into small pieces, in order to design my solution. To start with, I extracted text without cleaning the image for further processing and that resulted in a few characters not being pulled correctly. Once I figured out how to define custom configuration, I was able to write the logic for splitting the lines in order to maintain the conversation sequence

Accomplishments that I'm proud of

I'm happy to have spent my time working on this project at TAMU Datathon and learning more about the cool and exciting things that Image processing can help us achieve in order to automate existing processes and make things simpler to consume.

What I learned

I researched the various tools available in Python in order to read and recognize textual data from images. I learned how to write the logic for cleaning & enhancing images to ease the task of extracting information and moreover by discussing the problem statement with a CBRE representative, I was able to understand in depth what are the expectations. I was able to prioritize and work on things to develop the solution. I learned about the potential projects that use Image Processing, Computer Vision, and the scope as well as the impact of these projects in daily business operations for industries and government organizations.

What's next for Get-in-Line Image Parsing & Text Extraction

To broaden the scope, I intend to train a Machine learning model to rectify the extraction errors in Get-in-Line 1.0, I would like to include regional language to facilitate the extract of non-English characters and also generate a narration script/ dialogue conversation of the characters involved in the comic cartoon.

Built With

jupyter
python

Updates

Bansari Paresh Kothari started this project — Oct 09, 2022 01:53 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.