Inspiration:
We wanted to make images more accessible and interactive. Automatic alt-text on social media and apps like Google Lens inspired us to build our own lightweight Image Caption Generator where anyone can upload an image and instantly get an AI-generated caption.
What it does:
The app lets users drag-and-drop or select an image, sends it to the backend, runs it through a pre-trained AI model, and returns a short caption describing the content. Captions can then be copied or shared.
How we built it:
Frontend: HTML/CSS/JavaScript with drag-and-drop upload and instant preview.
Backend: Python Flask API to receive images and return captions.
AI Model: TensorFlow/Keras InceptionV3 (or optionally Google Vision API) for image classification → we generate a descriptive caption from the top prediction.
Challenges we ran into:
Installing large libraries like TensorFlow on a slow connection.
Ensuring the virtual environment used by Flask, pip, and VS Code matched.
Optimizing image preprocessing for real-time caption generation.
Accomplishments that we’re proud of:
Building a full working pipeline from drag-and-drop UI → Flask backend → AI model.
Achieving instant captions without training our own network.
Making the app light enough to run locally or on a small server. What we learned
Serving ML models through a web API.
Handling file uploads securely in Flask.
Implementing a clean, responsive drag-and-drop interface in vanilla JavaScript.
Dealing with environment and dependency issues in Python projects.
What’s next for Image Caption Generator:
Support for multi-sentence captions using an NLP decoder.
Multi-language caption generation.
Adding OCR so that text inside images is also extracted and described.
Deploying the app publicly with a simple share-button for captions.
Built With
- amazon-web-services
- css
- gcp)-tools:-virtualenv-for-isolated-python-environment
- html
- javascript-frameworks:-flask-(backend)
- languages:-python
- tensorflow/keras-(ai-model)-frontend-libraries:-vanilla-js-for-drag-and-drop-upload-&-preview-apis-/-services:-(optional)-google-cloud-vision-api-for-alternative-caption-generation-platforms:-runs-locally-or-can-be-deployed-on-any-cloud-(heroku
- vs
Log in or sign up for Devpost to join the conversation.