Through some initial tests, we saw that the Google Vision API was decent at optical character recognition (OCR), and we decided we would leverage that to make data entry as simple as taking a picture.
What it does
Detects tabular structure in images of handwritten/typed text, and converts it either to CSV or a Google Sheets spreadsheet.
How we built it
We built it using React-Express-Node stack, hosted on the Google Cloud App Engine. The app uses a Google Cloud Firestore to save scanned spreadsheets, the Google AI Vision API for OCR and Google OAuth for Google logins.
Challenges we ran into
The unpredictable order in which the Vision API recognizes text in an image made it a challenge to write a flexible algorithm capable of conserving the positional order of the (assumed) structured data. Also turns out that React does not play nice when trying to convert and uploaded image into a base64-encoded Buffer consumable by the Google Vision API.
Accomplishments that we're proud of
Gettting the algorithm to work properly.
What we learned
What's next for SVS
Leveraging the Google Document Understanding API to expand the use case of the app, such as creating entire documents with freeform data straight from pictures.