https://www.loom.com/share/9b1a5da18a72435f932fa08eced56017

Inspiration

We started by trying to save the world.

We wanted to make every video on youtube accessible to anyone who was visually impaired – we were planning on auto generating audio descriptions of the visuals in any video that could help you understand what was going on without being able to see

But, we ran into issues w/ the fidelity / precision of output we could extract from videos. W/o 99% precision, we were doing more harm than good.

So we thought – what can we do w/ this powerful tech that doesnt need to be precise?

What if we could turn everyone into a main character in their own manga just by uploading a video of something in their lives?

What it does

Vid2manga takes a video, indexes it visually, extracts a general summary of the video – and then we have some fun.

We "imagine" an entire manga plotline that we use to generate wicked art, voice audio (w/ 11labs tts), and a sick overall storyline that you can watch / listen to based on whatever video you send it.

You could share star wars and become a sith lord, or just a video of your dog wagging their tail and become a doggo companion having adventures in your backyard. the world is your oyster

How we built it

We built a quick web app w/ next.js, 12labs to index the video + generate robust summaries based on visual understanding of scenes / plot, GPT to dream up wicked storylines based on what we see, leap api / stable diffusion to generate inspired artwork, and 11labs to generate audio narrations for each of the characters and scences.

Challenges we ran into

originally we wanted to generate highly precise descriptions of video scene by scene w/ 12labs, however it was challenging because the more detail we asked for (which we needed) the more it began to hallucinate.

this was doing us more harm than good in our audio descriptions

Accomplishments that we're proud of

we're super proud that we got to where we did after pivoting heavily after dinner – we have a highly creative, imaginative, and incredibly fun product that we hacked together in a hot second

we turned highly powerful tools into toys – and we're stoked w/ how it came out.

What we learned

we learned how incredible multimodality can be + understood the power of chaining together all these different tools.

we also learned that you dont always have to be so serious – make a toy. it may just teach you something / change your perspective on how the world works :)

What's next for vid-2-manga

GTFOL baby

  • turning each character into an agent

Built With

  • 11labs
  • 12labs
  • gpt
  • nextjs
  • python
  • stablediffusion
Share this project:

Updates