Based on a real world problem with scientific literature from one of our team members we began building a cloud-based platform to quickly digitize highlighted parts of pictures. It seemed that there is a real need for a solution that focusses on only small parts of text instead of scanning the entire page the way it can be done with existing Apps like Dropbox, Apple Notes, Evernote etc.
Inspired by the environment of the ZKM we realized that this is a challenge that, if solved well, could be defining in how we interact with textual content in the future: Purpose-built Apps out, Ubiquitous capturing in.
What it does
The first prototype takes a picture of a text document with parts of it physically highlighted. This is sent to our nodeJS backend on DigitalOcean, cropped with openCV in python to relevant areas only and sent to Google Cloud Vision for OCR.
For a second prototype we decided to work on a native iOS App in swift where the user would select parts of the live camera input stream, which is then sent directly to Google Cloud Vision for OCR.
How we built it
The first prototype is built as a mobile-first web app and uses WebRTC to capture the camera stream. Pictures are sent as base64 to our HapiJS/nodeJS backend which stores metadata in a SQlite database and returns a uuid for the picture asset.
Different solutions were tried to optimize the storage of files, including Minio, but none proved useful as part of this hackathon and we used simple files identified via the uuid instead.
The uuid can then be used by the openCV cropping microservice to retrieve the original image, crop relevant parts with color-based masking and return a new version of the image to be sent to Google Cloud Vision.
Challenges we ran into
Everything was intended to be run in Docker, but we never got there in time. So putting it all together proved the most challenging, including the installation of openCV on the server.
Accomplishments that we're proud of
We didn't know each other before the event and went home at night to get a good nights sleep. With the points in mind we are very happy with the result we were able to come up with and might continue working on this in the future. Everyone was able to equally contribute with their own preexisting skills or interests for new things.
What we learned
- things about the aforementioned technologies
- focussing more on getting something presentable done vs. spending too much time on details
- bringing together different parts of a system is hard
What's next for Digital Thought Companion
we will see