Does this sound like you? You have a fire instagram photo all ready to go, but you are struck with the age old dilemma - what do I caption this? After dm-ing multiple friends for tips, you're finally forced to settle on an underwhelming emoji :sob:.
What it does
With CaptionCaptain, there will be no more fishing for the perfect insta caption. Snap or upload a photo through the CaptionCaptain iOS app, and it will leverage machine learning to detect objects and sentiments in your pic, returning a relevant phrase. A banger caption just became a click away.
How we built it
CaptionCaptain leverages the Google Cloud Vision API's powerful image recognition model. Through the iOS app, the photos are passed to the API which extracts keywords, object labels and sentiments. We then query our own CaptionCaptain API endpoints, which uses a custom search algorithm to retrieve the most relevant captions from our dataset of over 10,000 quotes, lyrics and popular phrases. By reverse engineering the keyword detection of the Google Cloud Vision API, CaptionCaption utilizes intelligent synonym mapping to exponentially increase the accuracy of the search engine and return banger captions.
CaptionCaption draws from a dataset of 10,000+ unique captions for every selfie, travel photo and celebration. Driven by automation and crowdsourcing, our dataset was created by leveraging the combined data scraping and processing powers of robotic process automation and the Dropbase API. Robotic process automation allowed us to scrape the web for lyrics, quotes, and popular captions while mapping them to relevant keywords. By passing the resulting data to the Dropbase API, we were able to organize the raw data scraped from the web as well as mapping keywords to intelligently generated synonyms and relevant captions. Dropbase allowed us to easily create a pipeline to transform the raw data into a centralized and queryable PostgreSQL database.
To query the database in an intelligent way, we used Node.js and Express to expose API endpoints for our iOS app. The server was Dockerized and deployed to the Google's Cloud Run service - providing a no downtime server experience. This allowed our API endpoints to be accessed consistently by all instances of CaptionCaption.
Challenges we ran into
- Setting up the Dockerization
- Making the iOS app talk to our API
- Optimizing the search engine to return relevant captions
- Intelligently generating synonyms to map relationships between the Google Vision API results and the caption database
- Setting up the automation pipeline to include the Dropbase API
- Scraping, organizing and classifying the caption data
Accomplishments that we're proud of
- Creating the entire app in 12 hours from beginning to end of development (after scrapping our initial idea)
- A clean iOS interface with image uploading etc
- Figuring out all the API calls
- Optimizing the search with the captions to a working degree
What we learned
- How to use Google Cloud Vision API for object recognition
- How to use Dropbase API for data transformation and easy offline to database transition
- How to dockerize an Express.js API and a React.js project, and deploy both in Google's Cloud Run service
- How to design a web-scraping to PostgreSQL data pipeline
What's next for CaptionCaptain
- Direct sharing from the app
- Optimize search engine to increase speed and efficiency
- UI and UX experience