A Thousand Words

1000 Words Logo

Inspiration

They say "a picture is worth 1000 words." Often times, people struggle with captioning their photos with something good. We wanted to create a web app that can effectively automatically caption images for users such that it can allow people to quickly derive meaning from images or caption a large quantity of images.

What it does

Our web app uses Google Cloud's machine learning vision API to take uploaded images and automatically generate sentences which caption the image.

How we built it

We used HTML5, CSS3, JQuery and Javascript to design a dynamic front-end as well as Node.js and Express.js to build our server to handle server requests. We then used Python, the Pandas library, and the Google Cloud Vision API to handle the computation and caption generation for the image.

Challenges we ran into

We had issues surrounding making a dynamic webpage without redirecting to a new page as well as connecting teach component of the app together such as having the server process front-end requests, calling on the Python module and then sending that back to the front-end to be displayed.

Additionally, learning about Google Cloud's products and services presented another challenge. Many of their products were quite complex. While this was a challenge at first, it was quite rewarding to learn their APIs, as it was quite useful in our application and will continue to be a valuable resource for future projects.

Accomplishments that we're proud of

We are proud of making a dynamic front-end for users as well as completing a minimum viable product by the end of the hackathon.

What we learned

We learned a lot about dynamic web design, UI/UX for users as well as making a complete web-app with both front-end and back-end.

Additionally, to create the sentence generator, a bit of linguistics had to be learned. Chomsky's system of transformational grammar had to be understood in order for the module to generate coherent sentences. It was interesting to see how other fields crossed over in software development, in something as simple as captioning a photo.

Finally, our team had to learn a lot about the Google Cloud's products and services, especially the Machine Learning Vision API.

What's next for A Thousand Words

What's next for A Thousand Words is we are looking to add the option for users to caption videos and submit video files to our web app. Additionally, it would be important to consider how the application would process more complex photos with hundreds of faces, often overlapping and out of focus. Also, it would be interesting to put this on other platforms, such as on iOS or Android, and allow for users to share their photo on their respective social media platforms. Finally, it would be helpful to expand the variety on the different structures of sentences generated.

Built With

Submitted to

RU Hacks 2021: Digital

Created by

Worked full-stack, helping design the front-end webpage that users see as well as back-end server that processes requests.

Jerry Hu
Worked on both front end and back end to ensure data sent from the client reaches the server and the results return to the client. Also helped with designing and styling the front end.

Kai Cheng Xu
Worked on the backend engaging in Python development. Ensured that the Python modules would call the Google Cloud ML Vision API properly and generate the right caption given the mood of the photo.

Dexter Ryan Floreza
Third Year Computer Engineering Student at Toronto Metropolitan University

Updates

Jerry Hu started this project — May 02, 2021 06:50 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.