We've all had that moment - perhaps we see a poster we really like, but the flow of life prevents us from being able to whip out our phone, take a picture, and crop out the poster itself. Maybe we have a bunch of documents that we want to take a picture of. Or maybe you're in a lecture and your professor decides not to publish the slides, leaving you with the only alternative of using your phone to take pictures of important slides, wasting precious time you could've spent listening for a subpar photo that's probably shaky and off-angle; or, even worse, your professor has a "no-phone" policy that prevents this in the first place. With everything returning to normal, this has become so much more common - and that's where InstaCap comes into play.
What it does
Our application allows you to seamlessly take pictures of documents, slideshows, and posters with just a willful blink; you can even take panoramic photos for wider views :O ! Images are uploaded and can be accessed via the web application, where you can download or remove images. Images are transcribed with OCR and can be filtered via the text in them.
How we built it
For the eye-tracking software, we used AdHawk MindLink glasses to see the user's FOV and track when they blinked. We used PyQt as the desktop app renderer, Firebase as our image-saving database, and React for the UI to display these images.
Challenges we ran into
There were often problems calibrating the AdHawk Mindlink glasses, but we all worked through it together. Furthermore, we spent a lot of time trying to deal with minor inaccuracies when many of our ideas demanded more detailed and specific measurements. The act of blinking also moves the eyeball, which makes tasks relying on both the focus of the eyes and timing a blink much harder to implement. Struggles of combining all of our contributions (AdHawk photo-capturing, Firebase storage, React UI) together in the end were also tumultuous obstacles we had to persevere through.
Accomplishments that we're proud of
We're proud of being able to create a functional and successful demo of what we wanted to achieve for InstaCap in the first place. Most importantly, figuring out how to use the AdHawk Mindlink Glasses, which was definitely an unfamiliar and fascinating piece of technology, along with being able to link all of our work in the end, felt like an invigorating experience. Despite the roller-coaster we went through, at the end of the day, our endless perseverance and curiosity to learn were what got us here.
What we learned
- Daniel: OpenCV and Adhawk + All nighters suck
- Bala: React (was primarily an AI and CV developer, but learned quite a bit helping with the React code); also sleep deprivation makes you worse at coding
- Eric: AdHawk is cool 😎
- Yimin: AdHawk
What's next for InstaCap
- Keeping track of where the eyes are focused, only returning the document that the user is looking at
- Improved OCR performance (because the transcript can often look wonky)
- Wireless hardware
- Make our AdHawk integration somehow run server-side