They say "a picture is worth 1000 words." Often times, people struggle with captioning their photos with something good. We wanted to create a web app that can effectively automatically caption images for users such that it can allow people to quickly derive meaning from images or caption a large quantity of images.

What it does

Our web app uses Google Cloud's machine learning vision API to take uploaded images and automatically generate sentences which caption the image.

How we built it

We used HTML5, CSS3, JQuery and Javascript to design a dynamic front-end as well as Node.js and Express.js to build our server to handle server requests. We then used Python, the Pandas library, and the Google Cloud Vision API to handle the computation and caption generation for the image.

Challenges we ran into

We had issues surrounding making a dynamic webpage without redirecting to a new page as well as connecting teach component of the app together such as having the server process front-end requests, calling on the Python module and then sending that back to the front-end to be displayed.

Additionally, learning about Google Cloud's products and services presented another challenge. Many of their products were quite complex. While this was a challenge at first, it was quite rewarding to learn their APIs, as it was quite useful in our application and will continue to be a valuable resource for future projects.

Accomplishments that we're proud of

We are proud of making a dynamic front-end for users as well as completing a minimum viable product by the end of the hackathon.

What we learned

We learned a lot about dynamic web design, UI/UX for users as well as making a complete web-app with both front-end and back-end.

Additionally, to create the sentence generator, a bit of linguistics had to be learned. Chomsky's system of transformational grammar had to be understood in order for the module to generate coherent sentences. It was interesting to see how other fields crossed over in software development, in something as simple as captioning a photo.

Finally, our team had to learn a lot about the Google Cloud's products and services, especially the Machine Learning Vision API.

What's next for A Thousand Words

What's next for A Thousand Words is we are looking to add the option for users to caption videos and submit video files to our web app. Additionally, it would be important to consider how the application would process more complex photos with hundreds of faces, often overlapping and out of focus. Also, it would be interesting to put this on other platforms, such as on iOS or Android, and allow for users to share their photo on their respective social media platforms. Finally, it would be helpful to expand the variety on the different structures of sentences generated.

Share this project: