Inspiration
I worked in the media industry and short form content generation was a key problem we were trying to solve
What it does
You can generate multiple short form videos out of the longer video asset and multiple assets
How we built it
The project leverages the multi-modal capabilities of Gemini to generate metadata, uses agents to come up with video ideas, put together shots and then refine the video output by taking in feedback from another agent and from the user
Challenges we ran into
Sticking to instructions over the course of the conversation is tough. In terms of the final results, sometimes, sudden switches in context in the video makes it look jarring or abrupt.
Accomplishments that we're proud of
This works not just on popular video titles but on any not-so-well-known videos as well! The videos generated are true to the video idea and are just a few edits away to make them YouTube and Reels ready!
What we learned
Agents are tough! Coming up with more standardized tools, stuffing them with more helpful contexts with clear instructions and may be even fine-tuning can help? (PS: Curious guy with access to no GPU🥲)
What's next for Splicy
- Add support for transitions and reduce final edits required in the generated video.
- Support to export to timeline formats of popular video editors and finish up the custom-ui on top of the adk backend instead of "adk web"
Built With
- adk
- colab
- django
- gcp
- gemini
- geminiembeddings
- pgvector
- postgresql
- python
- vertexai
- vlm
- vmm
Log in or sign up for Devpost to join the conversation.