Hilite

0) A single image frame. 1) Stitched text frames using accelerometer and inferred directional flow. 3) Affine normalization and denoising.

Inspiration

With rapid development of technology coming our way, digitalizing our lives and our content has never been so simple.

Cameras produce the videos and photography which litter our social networks; tablets gives us the degrees of freedom necessary for us to bring artistic expression onto a blank digital canvas.

All of these innovative technologies arise from the combination of sensors which lets us bring information from the real world onto the Internet.

Yet, not one piece of technology allows us to digitalize, highlight, and analyze the plethora of text which is imminently available to us in the real world.

What it does

Hilite is an extremely affordable text OCR-based scanner which allows its beholder to pick and bring text from the real world into the digital world.

By simply swiping Hilite across any piece of text on a newspaper, novel, or even a box, Hilite will decipher said text into a digital representation, analyzing it to simplify a wide variety of scenarios any user would experience in their daily lives.

Looking to record some key points from a newspaper or textbook? Looking to record calories or nutrient information from store-bought food products on a daily basis? Looking to record payment information/cryptocurrency addresses from people or retail stores?

Hilite has it all covered for you in a snap, and is notably accustomed and tailored to assisting with tasks related to either business, education, or personal healthcare.

How we built it

Hilite extensively makes use of OpenCV to normalize and stitch together webcam frames of text into a single panorama which best represents a sentence or text phrase from any real-world object.

_ Text OCR scanners which simply use high-frequency cameras rather than computer vision techniques cost >USD$160 while ours works with a cheap consumer webcam and can easily go for ~USD$20! _

Accelerometer and gyroscope information, alongside inferred directional image gradient flow is used to normalize recorded image frames and stitch them together accordingly.

A convolutional-recurrent neural network (C-RNN) architecture with text embeddings and information incorporated from Indico.io and Google Cloud NLP developed using TensorFlow was used to perform text-spotting & OCR on the yielded stitched text image to cheaply digitalize any form of text from the real world within close reach.

The paper to be referenced for the architecture can be located here: https://arxiv.org/pdf/1705.05483.pdf

The parameters for the linear layer and C-RNN model were determined through cross-validation and train-test ratio splits of 90:10 using a synthetic dataset of ~2000 samples. The C-RNN model was trained on the ICDAR2013 dataset.

Text embeddings from Indico.io are then incorporated on a simple feed-forward linear (logistic regression) layer to classify OCR'd text to relate to either business, education, or personal healthcare.

The Android app. to be paired with the highlighter/pen was made using React Native & Expo.

Firebase Database, Authentication & Storage was used for the entire infrastructure, saving and processing final OCR results alongside pre-processings necessary to make said results. As a result, notification of OCR results are realized on the app. in real-time.

The actual scanner/highlighter itself was built using an Arduino 101, some buttons, a webcam and a lot of hot glue.

Challenges we ran into

Stitching together and normalizing several frames of text in real-time was a large pain. Finding a decent webcam to use, and adjusting it to the right location on our highlighter was difficult. Preprocessing and training the two ML models we had, alongside verifying its generalization to necessary tasks for our project took quite a plethora of time.

Accomplishments that we're proud of

The OCR, text stitching, real-time Firebase architecture, and ML models actually work :').

What we learned

A couple of primers on real-time architecture, machine learning, and computer vision.

What's next for Hilite

Further improve the stitching and simplifying the hardware down to something commercial and aesthetic.

Built With

Submitted to

Hack the North 2017
- Winner Google
- Winner Hack the North Finalist

Created by

OCR, panoramic stitching, natural language processing, and Poisson image smoothing.

Kenta Iwasaki
Theorist at heart; engineer at scale.
Built the mobile app on React Native + Expo, implemented Firebase and the external APIs needed. Successfully glued the hardware without burning myself.

Carol Chen
<3
Vikram Sambamurthy

Updates

Kenta Iwasaki started this project — Sep 17, 2017 06:00 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.