Inspiration

Quite recently, we've seen a significant explosion in terms of visual data. The ease with which one can capture one's special moments has been a major contributor to this massive increment. On a personal scale, this becomes a problem when you want to find an image associated with a specific visual attribute from your collection. Existing approaches help you to some extent by using tag-based retrieval methods. But that's messy.

Wouldn't it be nice if you could just describe what you were looking for? Moreover, if there was an intelligent assistant dedicated to walk you through the process by asking you clarifying questions, making it so much simpler in comparison. Using SiftAI, you can do just that - fast, efficient retrieval of visual content by conversing with an intelligent agent trained via deep reinforcement learning. We are motivated by two fundamental principles:

  1. Enough visual content ensures that similar scenes exist in a collection of images.
  2. Deep RL allows us to utilize the power of semantic information encoded in natural language.

What it does

SiftAI is a personal assistant that helps you search and retrieve images from a database from just natural language descriptions. This is what it does:

  1. You describe what you're looking for in a sentence.

  2. Quincy, the bot, provides you a set of candidate images from your collection and follows up with a clarifying question in context of the description that you provided.

  3. You stop whenever you feel you've found the image(s) you were looking for.

How we built it

Deep RL: We trained a deep RL based finetuned question generator that takes the current context into account. This was trained via policy gradients arising from rewards based on actions (utterances) the agent (Quincy) makes. This was implemented in Torch7, a popular deep learning framework.

Infrastructure: We used a combination of tools like:

a) RabbitMQ -- As a messaging broker

b) Redis -- for storing the tokens and keys for fast communication

c) Websockets: We used django channels for implementing realtime communication between the frontend and backend of the project so that the user need not to refresh the page.

d) Django: We use django as our backend MVC framework which is the backbone of the project.

Challenges we ran into

  1. We were first trying to setup the whole project on one of the CoC servers due to heavy computational requirements for the project to run inference on deep learning models.

  2. Also, these machines in College of computing don't have a public facing IP and we had to setup ssh tunneling to access the web app for development on our local machines.

  3. Doing inference in realtime is a well known problem in the industry and we solved this problem by queuing the inference jobs using RabbitMQ as a messaging broker.

Accomplishments that we're proud of

We built a fairly stable pipeline that supports fast inference for a fairly sophisticated deep model on AWS in real time. All of this was implemented in a very short span of time from inference to testing to deployment.

What we learned

We learned that the hard part is to implement a fast inference pipeline. Our initial design of the same, although precise, was very inefficient.

What's next for SiftAI

SiftAI has a structure that is fairly generic and can be easily applied to a bunch of problems in other domains as well. Notably, where a major challenge is managing a large integrated database of images. This could be useful in inventory management and also somewhat in the healthcare sector - specifically, looking for context-driven anomalies in scans. It saves computation time if you know 'exactly' what you're looking for.

Share this project:

Updates