Inspiration

The inspiration for this project is that I tried to move away from the big social networks and was exposed to the need to provide alt text or captions for social media. It was a time of mass migration to Feedverse platforms like Mastodon, where people were vocal about ensuring your content was accessible and had textual descriptions of your included images. Free services like micr0-dev's Altbot showed how AI could generate those captions, but you still had to edit your posts to add Altbot's comment with suggested image descriptions. I'm enjoying posting my iPhone photos to PixelFed but find it hard to manage the captions on the small screen, so I want a tool to actually edit my posts to add the captions with human review.

My tool is Vedfolnir.org, a tool to allow me (and others) to generate AI alt text/captions from posts with images missing the accessibility text, review the AI-generated text, and update the post with the human-reviewed captions. The project is named for Vedfolnir, also spelled Veðrfölnir, which is a hawk in Norse mythology that sits between the eyes of an unnamed eagle atop the world tree, Yggdrasi.

The Vedfolnir.org tool was born during the Code with Kiro Hackathon. Kiro is an AI Spec Development tool wrapped in a custom VS Code clone. This hackathon entry is a significantly modified new version of that project; it currently lives at https://q.zero.vedfolnir.org. The project is an open-source web application for the Fedverse that automatically generates descriptive alt text for images using artificial intelligence. The goal is to help people like me make our photos more accessible for visually impaired users across the Fediverse.

What it does

The web app (https://q.zero.vedfolnir.org) allows a user to authenticate with an ActivityPub or PixelFed server, scan your posts for images without alt text/captions, pass those images to Bedrock's Nova Lite model for caption generation, review the captions and edit as desired, then update the original post's metadata for the image with the human-reviewed alt text.

Disclosure is important to me, so all AI-generated captions are automatically appended with the string " (AI-generated)" to ensure viewers and screen reader users are immediately aware of the origin of the alt text. The captions are fully editable, so the human reviewer can remove that text if the caption is edited to a point that the disclosure is no longer appropriate.

How we built it

For the AWS AI Agent Global Hackathon, I shifted Vedfolnir’s system's core AI engine from a Flask/Ollama backend using the LLaVA model to a fully serverless, cloud-native application utilizing Python-based AWS Lambda and Amazon Bedrock's Nova Lite model for image analysis. This substantial change included migrating the frontend from HTML/Bootstrap to a modern React/Vite single-page application and replacing the complex MySQL/Redis database setup with simple, scalable AWS infrastructure like S3, DynamoDB, and CloudWatch.

I leveraged Amazon Q Developer CLI, after writing AI-assisted code for new versions in Z.ai in Claude CLI, Amazon Q Developer CLI, and Amazon Kiro. The final version was a clean start with liberal use of the following text appended to my prompts: follow MVP (Minimum Viable Product) principles and keep it simple. I finalized the project on the Amazon Q-generated code once I realized that functionality could be pushed into the user's browser rather than being processed via lambdas.

Vedfolnir (https://q.zero.vedfolnir.org) was reconstructed as a modern, privacy-first, cloud-native application, utilizing a React and Vite frontend for an intuitive single-page application experience, which is globally deployed on AWS Amplify with CloudFront CDN. The core image processing is handled by a Python-based AWS Lambda function that integrates with Amazon Bedrock's Nova Lite model for AI-Powered Analysis to generate descriptive alt text. This serverless backend is secured with API key authentication and rate limiting, while the broader infrastructure is managed by Route 53, S3, DynamoDB, and CloudWatch for DNS, static hosting, rate limiting, and real-time monitoring, respectively, ensuring a scalable, cost-effective, and transparent open-source solution for accessibility across the Fediverse.

In this version of Vedfolnir, I designed it specifically to meet the AWS-defined AI agent qualification by fulfilling all three required conditions for autonomous and intelligent task execution:

  1. It uses reasoning LLMs for decision-making: The core of our agent is the integration with Amazon Bedrock's Nova Lite model, which goes beyond simple image tagging. This model performs the visual reasoning and Intelligent Analysis necessary to generate a descriptive alt text string that captures the context, objects, and scene—acting as the agent's key decision-making component for generating the output.

  2. It demonstrates autonomous capabilities with or without human inputs for task execution: Vedfolnir can operate autonomously by automatically performing Post Scanning, AI Generation, and Instant Publishing back to the Fediverse platform. However, it is also a human-in-the-loop agent, featuring a crucial Human Review workflow that allows users to edit and explicitly approve the AI-generated alt text before the final update, satisfying the "with human inputs" requirement.

  3. It integrates APIs, databases, external tools, or other agents: The agent successfully coordinates multiple systems to complete its task. It integrates with ActivityPub/PixelFed APIs to securely scan for and update posts (external tools). It also leverages the AWS Lambda function as a coordinator and uses DynamoDB to maintain rate limiting for application security, acting as a secure and reliable data component of the larger agent ecosystem.

Challenges we ran into

During the first few weeks of the project, I worked on more complex versions following some MVP (Minimum Viable Product) practices. That version had lambdas performing all of the app functionality, such as storing credentials in AWS, or loading the ActivityPub or PixelFed posts. That system started to get complex, with too many components, leading to difficulties managing the deployment scripts and maintaining CORS functionality.

In the last week of the hackathon, I decided to update my requirements to better follow MVP (Minimum Viable Product) practices. During that process, I realized most of the functionality could be moved to the browser. That helped my development plan as it was simple (and saved costs as well). The final app depends on one lambda with three python scripts, one to handle converting the images to a standard size and format, one to call bedrock and provide some bedrock logging, and a third rate limiter function for app security.

Accomplishments that we're proud of

The app, by nature, is GDPR compliant since the browser handles the access token calls to the Feedverse servers and contains all private data.

This new version of Vedfolnir is estimated to be incredibly affordable to host. Personal use is around $2.22/month thanks to the AWS free tiers, and high volume usage costs about two tenths of a cent per image.

Using Amazon Q Developer CLI for AI coding allowed me to create a secure, fast web app that exceeds my coding skills but leverages my technical knowledge.

What we learned

Overall, I was very happy to learn how affordable a small web app can be operated on AWS serverless infrastructure. It was fairly easy to create good infrastructure with the help of AI coding tools and the various AWS MCP servers. I've used legacy AWS servers like S3, EC2, and SQS, but this project really exposed me to how beneficial lambdas are and AWS Amplify is for serverless websites.

When building AI prompts, liberally add instructions to follow MVP (Minimum Viable Product) principles and keep things simple.

What's next for Serverless Vedfolnir

Once judging is complete for the previous hackathon, I'll move this entry's domain from https://q.zero.vedfolnir.org to the base vedfolnir.org domain and seek users to generate feedback on the product.

I'd like to offer the product online for others to use at no cost and think I can do that given how affordable it is to operate this serverless setup. The Bedrock Nova Lite model is a huge help in that goal since its costs are very low.

Built With

Share this project:

Updates