Inspiration

I was inspired by some of the non-linear TV initiatives Netflix has been creating lately as well as by some of the recent AI text based, never ending dungeon applications.

What it does

Story Weaver allows creators to short 60 second video episodes by leveraging a team of agents. You provide the audience, genre and POV and the agent team will handle the story progression, images and audio. Story Weaver is built around crowd funding the story arc, creators will upload the content to their favorite platform (currently in the progress of onboarding TikTok) and let their viewers select from AI generated next options. The summary of the story so far and the chosen option is then fed back to the agent team to create the next chapter in a never ending quest.

How we built it

We used google's ADK to handle the agent orchestration, among several sub agents for writing story, summarizing the story so far, generating next options and image prompt generation. We made sure the agents stored artifacts (and checked artifacts for the story id) in firebase along the way so a failed in flight generation could be restarted from where it failed. Once the image prompts, text chunks, next options and summary are generated by our agent team, we used gemini's TTS (falling back to gtts if throttled) to generate audio and used gemini image generation to create the images. The final video is composited together using moviePy and all artifacts and metadata are stored in firebase.

To avoid being overwhelmed by user requests, Story Weaver has a queue system for incoming generation requests. There is a background thread which reads from the queue (persisted using firestore) and actually does the generation job. A user may only have 1 queued task at a time and the story weaver backend only works on one generation at a time, this allowed me to remain on the free tier while still offering reasonable availability.

For the front end I used next js to create a simple web based app, I also integrated with TikTok using their login kit with callbacks to my website to preform authentication and enabled direct posting from the website using their Content Posting API.

Challenges we ran into

My first challenge was the exhausting my resources when doing a generation, I would often get model overwhelmed messages for the text generation and 429s for the TTS. I overcame this issue by 1) Implementing my queue system to stop concurrent requests and stay under the free tier RPM threshold 2) Adding fallbacks for premium TTS to use gTTs when I get 429 errors 3) Aggressively storing state and artifacts during the generation process to allow the agent team to pick up where they left off in the case of service errors or agent process issues.

My second challenge was getting the main coordinator agent to follow instructions. I managed to get it to work most of the time but occasionally it skips some steps. I think in the future I will probably migrate to using code for explicit well defined processes instead of hoping the coordinator does it right every time.

My Last challenge was integrating with tik tok, they are extremely non test friendly and require a live non-localhost urls even in sandbox mode, it took me some time to get the login flow to work since I had to re-deploy to prod every time I was testing the interaction.

Accomplishments that we're proud of

I am very proud of the queue system I set up, in the future this could be an easy premium-isable feature (allowing users to queue up more than one job or creating a premium queue for paying users) it was an interesting set up to get the background queue processing working on the same application that handles the HTTP requests.

I am also very proud of how robust the generation pipeline is to resource use and midpoint failures, my use of backup options when AI resources are depleted and aggressive state storage means I can keep the backend as cheap and efficient as possible, serving many users at scale without a massive cost increase.

What we learned

I learned that strict and well-defined processes are probably best left to code and not to a well prompted AI agent, AI agents should be deployed for more creative and undefined tasks which code cannot address.

I also learned how difficult meta prompting is without telemetry, I was very often flying blind, making changes and hoping my prompt preformed better. In such a non-deterministic, I think metrics and A/B testing are a must have, in a next iteration I would use an AI prompt eval system to more scientifically change and evaluate my prompts' performance

What's next for Story Weaver

  1. full integration with tik tok (I am currently fighting with the approval for this)
  2. expansion into more platforms (i.e. x, instagram reels, YT shorts,...)
  3. better vote tallying and actions (currently I rely on the creator to see what comment gets more votes and manually invoke the next story) using a central voting link on story weaver (or Solana Blinks) to automate the next generation (i.e. after a certain day take the most voted, or when a threshold is reached) removing the creator manual process completely
  4. enforced character names and styles, currently the image prompting and image creation do not stick to a perfectly set image style, I will add an initial agent phase where the team will create character names, personas, appearances before actually writing the story to enforce more consistency.
  5. AI prompt telemetry and removal of overarching coordinator agent, I want to pass my AI interface through a proper A/B interface like langfuse or Tensor Zero to better manage my prompts. Also, I would like to specify the flow of generation with code instead of meta-prompts to the coord agent.

Built With

Share this project:

Updates