Images are hard to organize, and sometimes it's easier to screenshot text to keep records. We needed a tool to sort images, much like grep does with plain-text.

Probable use-cases include finding records of particular lecture slides, notes, conversations, etc.

What it does

On the command-line

imgrep uses tesseract-ocr to lift text out of images, and find files that match user queries. File listings are printed to stdout so it can be used in shell-scripts.

To install, first install the tesseract-ocr dev packages then run

go get
go install

Perform searches in your current working directory by running

imgrep s -n QUERY

In the browser

Interact with imgrep via a search-based web ui; enter keywords and watch as images from your hard disk show up in the results!

To install and start:

go get
go install
imgrep-web start &

Then visit this super awesome URI in your favourite browser: localhost:1337

How we built it

We built imgrep by leveraging the power of an Infinity Stone: the Tesseract!

More seriously, imgrep is a go-cli app. It's built using Unix philosophies of program design, and works as an effective command-line tool through Unix pipes.

Challenges we ran into

Coming up with a viable hack is hard

After discovering that the Oculus Rift doesn't really support Unixes (sad-reacts only), we had to pivot. We pivoted four times before we came up with imgrep. After that the code just flowed.

Technical stuff

OCR is kind of slow, so we made imgrep fast by caching the results of Tesseract OCR's processing in a small sqlite database.

Accomplishments that we're proud of

  • We built a useful utility which feels at home in the shell.
  • We bundled a web server and familiar search-based web interface, so that average users need not know the shell is even there

What we learned

  • Tesseract is awesome

What's next for imgrep

  • Flexible configuration
  • Auto-indexing of files (currently imgrep init must be called to index new directory trees)
  • Better web-ui
  • Efficiency ™

Built With

+ 2 more
Share this project:


posted an update

Emir Hasanbegovic told us we couldn't demo cuz like we might've been pre-tagging and using doctored datasets. That wasn't gonna fly, so now we do everything in two different ways.

Pre-index for faster immediate searches, or do slow real-time searches! you pick!

Log in or sign up for Devpost to join the conversation.