TECHY TEXT RECOGNIZER

Inspiration

Our app is idea is highly inspired form Samsung Bixby vision.. Samsung smartphones have a special feature called Bixby vision which would be able to scan any document and recognize the text on it , identify the color of the object and we could also search for information present on it. So we are trying to build an app with more special features and would also help the police in certain case investigation to identify and collect information about crime incident and any place of interest

What it does

Our application detects the text on any image and analyses it and separates the regions of interest (ROI) from the image , and then searches for the information related to the text present on the image , collects information from various sources and then provide them to the user .

How we built it

For ROI detection we used a pipeline method called EAST (Efficient and Accurate Scene Text) - An Efficient and Accurate Scene Text Detector composed of a Fully-Convolutional Network (FCN) and a locality-aware Non-Maximum Suppression (NMS) algorithm, is capable of detecting lines of text without using expensive traditional algorithms and is able to process images with the resolution of 1280x720 at around 16 frames per second (fps). STEP 1 – IMAGE PREPARATION The first step is to load the image and process it using OpenCV in order to get it to a form suitable for use by the EAST model. This involves resizing it, if its dimensions are not multiplies of 32, while retaining the width:height ratio to allow transformation back after the extraction. For large images we also shrink them to a maximum size of 2400 pixels wide and high, to avoid running out of memory during the process. STEP 2 – INITIAL IMAGE PROCESSING The next step is to process the image with a forward pass through a proven Tensorflow pre-trained EAST model. STEP 3 – REGION OF INTEREST IDENTIFICATION After the EAST model solves the challenging problem of computing the probability of a region containing text or not, the next step is to translate the output of the model into the rotated bounding boxes of the ROIs in the original image. STEP 4 – RATIONALISATION OF BOUNDING BOXES The EAST pipeline paper suggests merging the intersected geometries using the NMS algorithm. Our implementation uses a modified NMS algorithm, as recommended in the original paper, which assumes that nearby pixels tend to be highly correlated so it can merge geometries in a row-by-row manner. This hugely improves the speed of the process, compared to the original algorithm, as it reduces the complexity from O(n2) to a best-case scenario of O(n). IMAGE PRE-PROCESSING From previous projects, we also know that an image pre-processing step to simplify the image will improve the accuracy of the extracted results. In our implementation, we transform the image to a grayscale version and then to a binary image, by applying Otsu’s binarization. TEXT EXTRACTION VIA OCR As mentioned earlier, we use Tesseract as our OCR engine, Tesseract has long been considered one of the most accurate open-source OCR engines available and the fact that it had a new beta version that used LSTM (Long short-term memory) internally, meant that it was the obvious choice for OCR processing in our implementation. Our implementation passes the images to Tesseract to allow it to extract text from then stores them as metadata to improve the discoverability of the presentation documents.

Challenges we ran into

We learned some concepts especially for developing this project. Starting with text extracted from images we have used a sophisticated, pre-trained EAST neutral network to identify possible text within each image, we have then processed each image by cropping and pre-processing them to simplify the job of an OCR processor, and used the Tesseract OCR engine to extract the text that they contain.