Image Description API

Inspiration

This project started with a personal reflection. I occasionally experience minor visual difficulties, which led me to think about how technology could support people with more significant visual impairments.

The result is a serverless, multilingual API that turns images into both text and audio descriptions. Built to improve accessibility, enhance content workflows, and enable new use cases across the web.

What it does

The Image Description API accepts a base64-encoded image and a target language. It returns:

A detailed description of the image in plain text
An optional MP3 audio version of the description, using natural-sounding voices

Use cases include:

Accessibility tools (e.g. alt-text generation, screen readers)
E-commerce automation (product descriptions)
Social media (caption generation in multiple languages)
Educational tools (describing visuals for learners)

How we built it

We used a fully serverless AWS architecture:

Amazon Bedrock (Nova Lite model): For generating image descriptions using a multimodal foundation model
AWS Lambda: Orchestrates each request, handles validation, model invocation, and text-to-speech
Amazon Polly: Generates the audio version of the description in a natural voice
Amazon API Gateway: Exposes a secure, scalable REST API
AWS SAM: Used for infrastructure as code and streamlined deployment

Everything runs serverlessly with fast cold-start times and low latency.

Challenges we ran into

Handling image inputs across API Gateway, Lambda, and Bedrock with minimal overhead
Prompting the Bedrock model to generate accurate, concise, and translatable descriptions
Balancing performance with multi-step operations (image → text → translation → speech)
Making the API responses simple yet flexible (JSON or audio, multiple formats)

Accomplishments that we're proud of

Built a fully functional, production-ready solution
Delivered multilingual accessibility out of the box
Created a tool that's flexible enough for real-world use cases across industries
Open-sourced the entire codebase for others to use, learn from, and extend

What we learned

How to integrate AWS Bedrock’s multimodal AI models (like Nova Lite) into real-time workflows
Effective prompt engineering for image understanding in multiple languages
Best practices for managing binary data (images/audio) in serverless APIs
How powerful and accessible AI becomes when paired with clean cloud architecture

What's next for Image Description API

Build a lightweight web app demo to make the API more accessible for non-developers
Allow batch image processing

Built With

amazon-web-services
apigateway
bedrock
lambda
nova
python

Updates

Matias Kreder started this project — Jun 24, 2025 11:16 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.