The urge to reduce the manual workload of extracting/writing the text information out of digital media. Rather, the ability of the programming / machine learning can be facilitated to automate the hard process into an easy one.
What it does
Extracts text/information from image files of paystubs
How I built it
I used python as a programming language and pytesseract as the library for OCR.
Challenges I ran into
The difficulty of handling/preprocessing of the image data before the text retrieval process.
Accomplishments that I'm proud of
Having been done/learned something new.
What I learned
Learned to implement OCR to extract information from image files.
What's next for OCR Pay Search
Scalability - The solution can be optimized to handle a huge volume of data. AI learning - The solution could be improved as a self-learning model by minimizing the error-rate of information extraction.