Inspiration

Storyboarding upcoming events is a proven strategy for helping children on the autism spectrum cope with new situations. While daily events like a trip to the store, the movies, the park or a haircut can seem ordinary, for some they are unfamiliar and potentially scary events that would benefit from visualization and discussion. Children can get a sense of what to expect. Parent and children can anticipate challenges and coping strategies and they can serve as progress meters for things like daily chores or events. However, while storyboards are useful, they are time consuming to create. Hand crafting the visuals, the materials and even the storyboard itself can consume parents who are already stretched thin.

What it does

StepPrep allows you to have a simple conversation about an upcoming event, identity the key milestones, challenges and interactively iterate to create an interactive storyboard within minutes.

How we built it

Built using Google resources, it lives in Google Cloud run, accesses a series of Gemini models and makes use of the live voice API as well as the 'nano banana' image generation features.

It was built using agentic coding techniques facilitated by VS Code and cline/kilo code extensions using the Gemini Pro/Flash models running in Vertex.

Storyboards created by step prep use firestore and google cloud storage to create a permanent home that can be bookmarked and viewed later for the next haircut, store trip, etc.

Challenges we ran into

The dreaded HTTP 429, resource exhaustion is real. Pure access via Vertex is often limited by quotas which are hard to increase without a Google Sales connection. Access via APIKey through aistudio.google.com seems to unlock higher quota limits, though the APIs change in subtle ways.

Original plans to have images generated for each storyboard panel had to be compressed into generating all panels at once to avoid resource limits. Original plans to include optional Veo video animation of the storyboard proved infeasible due to content limitations and quota limits. Gemini usually performs well with function/tool calling, but adding hints and realtime syntax checking for it's work can help it retry and recover when it makes mistakes.

Accomplishments that we're proud of

Establishing real time audio interaction with Gemini was surprisingly easy thanks to the examples and native use of web audio features embedded in modern browsers. It's very satisfying to have a conversation and see the UI change as the conversation unfolds without having to type!

What we learned

Multimedia interactivity with AI is a game changer for workflows involving rapid iteration. Giving an AI model the ability to generate content that's visible and iterate via voice is especially productive. Google Cloud offers a great platform for hosting with serverless, realtime container hosting in cloud run, databases in firestore and cloud storage in GCS buckets. Terraform can still be challenging for new features like github triggered cloud builds or cloud run domain mappings, but is still better than deploying by hand.

What's next for StepPrep

We are excited to show it to teachers to see if it can help them in their preparations for students who may benefit from getting a preview of upcoming activities you or I may take for granted!

Built With

Share this project:

Updates