Through some initial tests, we saw that the Google Vision API was decent at optical character recognition (OCR), and we decided we would leverage that to make data entry as simple as taking a picture.

What it does

Detects tabular structure in images of handwritten/typed text, and converts it either to CSV or a Google Sheets spreadsheet.

How we built it

We built it using React-Express-Node stack, hosted on the Google Cloud App Engine. The app uses a Google Cloud Firestore to save scanned spreadsheets, the Google AI Vision API for OCR and Google OAuth for Google logins.

Challenges we ran into

The unpredictable order in which the Vision API recognizes text in an image made it a challenge to write a flexible algorithm capable of conserving the positional order of the (assumed) structured data. Also turns out that React does not play nice when trying to convert and uploaded image into a base64-encoded Buffer consumable by the Google Vision API.

Accomplishments that we're proud of

Gettting the algorithm to work properly.

What we learned

That ML + JavaScript is a perfectly good combination, and the Google Cloud stack is pretty good for developing scalable apps at lightning speed.

What's next for SVS

Leveraging the Google Document Understanding API to expand the use case of the app, such as creating entire documents with freeform data straight from pictures.

Built With

Share this project: