The urge to reduce the manual workload of extracting/writing the text information out of digital media. Rather, the ability of the programming / machine learning can be facilitated to automate the hard process into an easy one.

What it does

Extracts text/information from image files of paystubs

How I built it

I used python as a programming language and pytesseract as the library for OCR.

Challenges I ran into

The difficulty of handling/preprocessing of the image data before the text retrieval process.

Accomplishments that I'm proud of

Having been done/learned something new.

What I learned

Learned to implement OCR to extract information from image files.

What's next for OCR Pay Search

Scalability - The solution can be optimized to handle a huge volume of data. AI learning - The solution could be improved as a self-learning model by minimizing the error-rate of information extraction.

Built With

