C# + Microsoft Cognitive Services
Objective-C++ + Tesseract
Finding things is time consuming and can be irritating. The inspiration for a mobile app that can highlight a certain piece of text came from various situations, from students wanting to find a certain phrase in their books, to helping the older population trying to read ingredients labels. The Ctrl-F command in electronic devices has time and time again made finding things more efficient, so we thought, why not bring that into the real world?
What it does
Our app takes a string input to highlight in a live-feed of the camera.
How I built it
We first reflected on what we knew about: tools like OpenCV, Objective-C, .NET, and Microsoft’s Cognitive services. Since we did not have mobile app development background, it took a while for us to get used to the XCode environment and get a live-stream of the rear-view camera working. After we had live streaming from the camera, we looked into how we could break down the images to feed into OpenCV matrix methods to perform OCR. We began brainstorming ways to do that, and researched MatLab’s OCR algorithm. However, after doing more research into OCR options for iOS, we found that Tesseract basically did what we wanted on an iOS platform. We also tried C# with Microsoft’s Cognitive Services which made it easy to prototype the app. Microsoft's Cognitive Services has a limit on how quickly we can submit images to parse, but at least it works!
Challenges I ran into
Spotty wifi made it hard to research and access libraries/documentation Bring laptop with ethernet port next time XCode iOS development testing wasn’t too team-friendly (it requires the developer to “sign” with their own account, making it difficult to share the same code. After many trials we found a solution to get around this that wasn’t too much of a hassle, but it was annoying in the beginning. Microsoft Cognitive Services would’ve made development much easier, but being on the cloud made the processing slow Incomplete/Old documentation/sample code can be misleading One example is when we were trying to get Tesseract to work. Even though we followed the installation instructions closely 3 times, there were always new bugs that revealed themselves. Tesseract isn’t new too, so a lot of online tutorials are outdated and only work for the older releases. Computer vision takes a lot of time. It's hard to get a real-time scan of text images.
Accomplishments that I'm proud of
Got to display more or less live-stream from the camera Built a search box and was able to read the string text from the user Making XCode cooperate over github
What I learned
Basic Objective-C Building iOS apps in XCode Microsoft Cognitive Services OpenCV is like MatLab Tesseract has OCR for iOS Debugging, putting code together
What's next for Ctrl-F
Refine computer vision on text Smart search - like searching in Google, where it can guess what you want Regex More interactive UI - sounds, lights, better AR to point to the text. Searching for items (like if you lost your keys) :O