Inspiration

I was inspired by gemini 3's ability to take multi modal input specially video i tested it with a few videos and found the flash model good enoughuo

What it does

It takes a video analyses it and finds issues with specific timestamp

How we built it

I am using gemini-3-flash-preview and go at the backend with a simple react frontend

Challenges we ran into

There is some latency between uploading or recording and getting a response back but that can be improved if I moved towards async

Accomplishments that we're proud of

I think integrating video as part of multimodal llm is something that I hadn't done so I am proud that this makes that simpler

What we learned

I learned about uploading videos and managing videos I feel I can learn about storing videostoo but for now this is it

What's next for Gemini Coach

Better interface, prompts for different scenarios

Built With

Share this project:

Updates