Tesseract meets CNN

Enjins

Project can be found here. The CNN model is not included in the upload due to upload rate limit, but can be found in this repository.

Inspiration

Creativity always beats brute-force. We mimic how our brains look at the pictures, namely: if there's a label, let's read the label. If not: can we recognize the item?

What it does

To start of, we try to extract text from the images by using a library called Tesseract. After filtering the images with text, the remaining images are sent to the pre-trained CNN that uses the VGG-19 model as a base and by adding a few extra layers of our own, we can make the net a better fit for the specific images in the dataset.

How we built it

Like every project: start with some exploration and more importantly: be creative. We started with some drafts and discussions, went on with data analysis and decided to build 2 models and combine them in a clever workflow.

Challenges we ran into

During the analysis we noticed that the quality of photos could be improved. One way could be to let users send their photos through the app. When using the camera, users can see the label that is predicted by the CNN. If correct, the user can confirm this and thereby improving the quality of the data and adding more labeled data to the total dataset.

Accomplishments that we're proud of

Besides the fact that accuracy is quite well, we think we can easily improve it with some more ideas. We're proud that with some effort we can create stuff which is already quite valuable, and could be even more with ideas like the one above.

What we learned

Use cases like these are actually quite doable for a strong AI engine. Combining it in an infrastructure as described above, we think you can get up to 90% accuracy, taking in to account the feedback users give.