Sonetica

Inspiration

As creative writers, we’ve always felt that stories, whether they’re fictional pieces, poems, or just everyday journal entries, can showcase an incredible amount of meaning and emotion. But sometimes, just reading them doesn’t capture the full picture we may all visualize.

We started wondering: What if a story could not only be read, but actually be seen, heard, and experienced just like a short film? This is why we created Sonetica, a tool that brings writing to life through short, expressive videos. Sonetica helps turn your words into something visual, personal, and alive.

Our goal is to make storytelling more immersive, even for everyday moments.

What it does

Sonetica turns your words into short, expressive videos. Whether it’s a story you’ve written, a poem, or even a personal journal entry, you just type it in. Additionally, if you want to, you can add a song that captures the mood. Maybe it’s a track that inspires you, or just something that feels right.

From there, Sonetica reads between the lines picking up on the emotion, the mood, and the general theme and uses that to generate an image using Stable Diffusion. That image becomes the heart of a short video that is carefully created by the pipeline to help your story come to life.

How we built it

Here’s how Sonetica’s pipeline works: the user enters their text (short story, journal entry, poem etc) and optionally uploads a song. The audio is analyzed using Librosa to extract features like mood and rhythm.

Both the text and audio data are processed and combined to create a detailed prompt using Gemini. This prompt is then fed into Stable Diffusion to generate an image that reflects the emotion, meaning, and themes of the input.

Next, the generated image is passed to Veo, which creates an immersive short video. To enhance the audio experience, we leverage Lyria to produce a unique sound that complements the visuals and the user’s chosen song.

The frontend is built with React for a smooth user experience, while the backend is powered by Python to handle the processing and coordinate the pipeline.

This setup allows us to seamlessly transform written stories and music into vivid, expressive videos.

Challenges we ran into

Implementing Stable Diffusion was one of the toughest parts since none of us had worked with it before. Figuring out the optimal level of complexity for generating images took a lot of experimentation — we wanted to maximize performance without losing the essence and emotion of the user’s input.

Integrating audio analysis with text processing was tricky because the audio features we extracted using Librosa sometimes conflicted with the tone suggested by the text. For example, a calm poem paired with an upbeat song created mixed signals that made prompt generation inconsistent. We had to develop heuristics and weighting strategies to balance these inputs so the final prompt truly reflected both the music’s mood and the story’s meaning.

On the integration side, a big challenge was figuring out how each API worked and the specific data formats they required. Gemini, Lyria, and Veo all had different inputs and outputs, but we were able to figure out how to make them work seamlessly. Making sure the data flowed smoothly between these tools took careful planning and experimentation.

We also had to build fallback mechanisms in case any part of the pipeline failed or caused delays, so the system could still produce a video without crashing or freezing. This made the whole process more reliable and user-friendly despite the complexity behind the scenes.

Accomplishments that we're proud of

We’re proud of successfully building a full pipeline that transforms written stories, poems, or journal entries into short, expressive videos combining custom-generated visuals and user-selected music. Despite not having prior experience with Stable Diffusion, we learned how to fine-tune prompts and generate images that truly capture the mood and meaning of the input text.

We also managed to integrate multiple complex tools, including Librosa for audio analysis, Gemini and Lyria for prompt creation and sound design, and Veo for video generation, into a smooth, reliable system. We built fallback mechanisms to keep the experience seamless even when parts of the pipeline encountered issues.

All of this came together within the tight timeframe of the hackathon, and we’re excited that Sonetica can bring words to life in a new, immersive way.

What we learned

Building Sonetica gave us valuable experience working with multimodal AI by combining text and audio inputs to create meaningful, expressive videos. We learned how to extract and fuse features from written stories and songs, two very different data types, to generate visuals that capture the emotion and mood behind both.

Beyond deep learning and AI, we also gained hands-on experience integrating a chatbot interface and developing a responsive frontend using React. This taught us how to design smooth user interactions that connect seamlessly with complex backend AI pipelines.

We faced challenges coordinating multiple AI models and APIs while ensuring real-time responsiveness and reliability, which pushed us to develop effective error handling and fallback strategies.

What's next for Sonetica

Moving forward, we want to expand Sonetica’s capabilities to support longer videos and richer storytelling formats, allowing users to create mini-movies from their writings. We’re also excited to explore adding more customization options, like different visual styles or mood filters, so users can better tailor the videos to their unique voices. This can be accomplished through integrated feedback loops for example.

In addition, we would like to improve the speed, so generating videos feels instant and seamless. We’d also like to build mobile-friendly versions to make it easier for users to create and share on the go.

Finally, we hope to open Sonetica to a wider community, from casual journalers to poets and storytellers, and potentially explore partnerships or integrations with creative platforms to bring storytelling to life in new ways.

Built With

diffusion
gemini
librosa
lyria
python
react
veo

Updates

Sarvagya Goyal started this project — Jun 22, 2025 01:48 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.