Inspiration
I spend a lot of time thinking of the perfect caption, after pasting an image. Captions help people learn more about the image, and in certain cases visually impaired people to "see" the image. A good caption or description is super important.
Instead of thinking, drafting and spending time creating the perfect caption, use Auto Caption to generate captions using AI.
What it does
Reads the image and generates a list of few captions relevant to the image.
How we built it
Using the AppsSDK & OpenAI API (gpt-4o model) were used.
Challenges we ran into
Security of the API is important, integrating frictionless authentication is a big challenge.
Some SVG images could not be read by the selection API, reducing the potential to just raster images. This was a bummer as it is a popular format in the design world.
OpenAI would not accept my structured JSON schema for output. I had to get clever with prompt engineering to ensure a specific schema of JSON is outputted for the front-end for consumption.
Accomplishments that we're proud of
Integrating OpenAI to read the image and output a caption, first time using their API was a learning experience with different models, input types, prompts and parsing output messages.
What we learned
AI integration is awesome, however, it can take a lot of time sometimes to generate (10+ seconds). The user may not be willing to wait that long, the experience could be optimised by pre-generating captions before the CTA button is clicked to improve speeds.
We need interactive progress bars while the user waits for the response to be generated!
What's next for Auto Caption
For documents, this is a useful feature. When the positioning of caption API is available, we can directly add the caption underneath the image, instead of centre of document or relying on the user to drag-and-drop it into the correct location.
Built With
- firebase
- openai
- typescript

Log in or sign up for Devpost to join the conversation.