Medimations

Inspiration

I recently got connected to a doctor who is responsible for creating medical animations to help educate patients, and I got informed about how important this is in educating and preparing patients about surgeries and medical procedures.

What it does

Medimations turns plain English prompts into biologically accurate medical animations. Users describe what they want to see, upload or generate a reference image, and our system produces a short, medically verified animation with an explanatory voiceover. The core focus is medical correctness. This makes Medimations useful for patients, biotech, healthcare startups, research communication, and scientific storytelling where visual accuracy matters.

How we built it

I built Medimations as an agentic pipeline around Veo 3.1. A Gemini-based agent rewrites user prompts into optimized animation instructions. Users can upload or generate a biomedical reference image, which is validated using BiomedCLIP, a biomedical vision-language model trained on 15M medical images. Veo animates this reference, and I sample frames from the output and re-check them with a fine-tuned Gemini 3 flash model for medical accuracy. If something is biologically wrong, Gemini automatically re-prompts Veo, asking it to make edits to the original video to fix any inaccuracies. Once the animation passes validation, TwelveLabs analyzes the video to generate a clear voiceover script, which is converted into speech using Deepgram.

Challenges we ran into

Medical correctness is brutally unforgiving. Generative models love hallucinating anatomy that looks right but is completely wrong. Aligning Veo’s output with biomedical validation required careful frame sampling and prompt control. I discovered that simply asking Veo to generate from pure text, even with the agentic workflow, was pretty hard to get right, and so I discovered utilizing an ai generated image, to then animate, allowed Veo to maintain consistency much easier. I also discovered that completely regenerating the Veo animation over and over again was highly inefficient, so I used some of Google's Vertex tools, which are still in Beta but allowed the Gemini model to tell Veo what to edit about the original video instead of completely regenerating it, making it way easier to get a good output.

Accomplishments that we're proud of

I'm proud of the fact that I built a full end-to-end system that helps verify medical accuracy in animations. The agentic correction loop improves biological accuracy instead of purely trusting a generative model. The project also proved that grounding video generation in medically validated images significantly improves output quality. On top of that, I'm proud of integrating automated narration, so the final result is immediately usable.

What we learned

Generative AI alone is not enough for science or medicine. Validation layers are mandatory when we need accuracy. I also learned that image-first pipelines are far more controllable than raw text-to-video when correctness matters. Agentic workflows work best when tightly scoped and capped, not left open-ended.