People upload pictures on social media on a regular basis. We thought it would be cool if a tool/service could examine these images and automatically generate captions for them. We decided to add similar GIFs too!

What it does

So you upload an image or share an image URL, and the Azure based application uses Microsoft's Project Oxford API to mine the image to extract its feature set. This feature set is used to generate captions for the image and also provides the user with GIFs that match the input image. This makes captioning images easy and fun!

How I built it

The user is expected to either upload the image or provide an image URL. Project Oxford's Face API is used to extract image characteristics like category, age, gender, etc of the people in the image. The Emotion API is used to study the sentiments of the people in the image. The extracted feature set is filtered based on the confidence and uses the Bing image search API to OCR captions from the returned results using the Project Oxford's Computer Vision API. The captions are collected after making use of the Spell check API and displayed to the user with an option to shuffle or switch the suggested caption. We also provide GIFs using the same feature set for querying. We built a Python-flask based application deployed on Azure for the technology stack.

Challenges I ran into

Setting up flask on Azure and getting relevant captions from the OCR run on Bing images.

Accomplishments that I'm proud of

Hacking for 36 hours with no coffee!

What I learned

Managing multiple services, azure deployment

What's next for fotosynthesis

See how we can further refine the accuracy of our captions and also provide options to specifically suggest movie/comic quotes, custom to the user

Share this project: