Inspiration
I'm a developer and I love hacking and working on new ideas. I am not a video editor, at all. In my first hackathon on this site, I discovered making the product video for my submission can be harder than the project itself - recording a video, making the voiceover with Amazon Polly and synching it manually using some random online tool, only to get a pretty mediocre video. So I thought why not automate it so I can get a mediocre video with less work :)
What it does
Kirbuk is an agent that generates a SaaS product video automatically, including narration. User points the agent to its site, adds some instructions if he likes and the agent uses various AI tools like browser use and Claude LLM to then understand the site by exploring it, create a general script for the product video and then create a voice script for narration and a video script. It then joins both for the final video.
How we built it
When starting to build I had a lot of research to do to figure out if this is even possible:
- Can I get Bedrock Agentcore to generate video (had to use Playwright in the end as I can only find screenshots on the runtime)?
- Can I get a normal sounding script with only LLM and Polly and no human intervention?
- Can I sync everything together to a video on the agent runtime?
After the research phase, I used the Agentcore starter kit and the tooling in it to deploy an agent and added the features step by step stopping to make sure they work by themselves and to fix any IAM permission errors that popped up :) Then I spent the rest of the time going to Product Hunt, selecting random products and testing on them and improving the system to create a better an better video. This is where time and my AWS credits ran out.
Simple architecture diagram: https://github.com/Sveder/kirbuk/blob/main/simple_architecture.md Full architecture walkthrough: https://github.com/Sveder/kirbuk/blob/main/aws-architecture.md
Challenges we ran into
So many challenges:
Working on newly released tools and features, some still in early access was a challenge. For example, for me (this seems fixed now?) the AWS Bedrock Agent browser tool live recording/replay tool is not working, showing a white screen. What is saved in S3 is some kind of format based on screenshots instead of a video as Playwright can create (even though I'm pretty sure it uses Playwright underneath).
Trying to get an LLM to generating Playwright scripts is hard and doesn't always work. The videos produced by Playwright are pretty high quality though.
Accomplishments that we're proud of
Due to holidays I only had a week (-1 outage day +2 extension days :)) to work on this hackathon, so I'm proud I got to something that is somewhat working, that creates videos and narration with 0 knowledge about the web site given. The tools I've used are very new to me and generally newly released so required at least some hacking and fighting the platform (see the challenges and platform bugs I mention there), so I'm very proud of my ability to navigate and create something this technically challenging. Special thanks to Michał from AWS for his help during office hours and after!
What we learned
I learned a ton about the AWS Bedrock, Bedrock Agentcore and browser use platforms - how to configure, deploy, monitor and troubleshoot. I also learned a lot about video and voice creation.
What's next for Kirbuk - Automatic Product Video Agent
While the results currently are not amazing, they will improve significantly as tools like NovaAct/Agentcore browser become more mature and better. I can also try to add to the system prompts and try different models to see which works better for generating the various parts of the system (video script, playwright, voice script). Other features I'm excited to add:
- Allow the users to ask for changes and additions after video is done.
- Add video effects, transitions and other professional video "moves" to make the videos stand out more.
- Improve the script to include humour, roast, sales pitch, call to action, etc.
- Allow automatic upload to Youtube/Vimeo with description and other fields filled out according to best practices (for example call to action in description, video cards with links to the site, etc).
Built With
- agentcore
- bedrock
- claude
- polly
Log in or sign up for Devpost to join the conversation.