# 1. About the Project
## Inspiration
Scientific protocols are often trapped in dense, text-only PDFs, leading to high cognitive load and procedural errors in the lab. I was inspired to build LabGen Studio to bridge the gap between dry technical documentation and high-fidelity visual learning. My goal was to create a "Cinematic Lab Manual"—where a researcher can go from a written protocol to a 4K, narrated, and animated instructional video in minutes.
## What it does
LabGen Studio is a multimodal production suite that:
- **Synthesizes Storyboards:** Uses gemini-3-pro-preview to parse raw text into cinematic scenes.
- **Generates 4K Visuals:** Produces photorealistic laboratory imagery using gemini-3-pro-image-preview.
- **Professional Narration:** Converts scripts into high-quality audio with gemini-2.5-flash-preview-tts.
- **Cinematic Animation:** Animates static scenes into 1080p video using veo-3.1-fast-generate-preview.
- **Grounded Research:** Employs Google Search grounding to ensure protocols are derived from authoritative sources like .edu and .gov domains.
- **Real-time Assistance:** Features "Jeona," a Live API-powered assistant for voice-based lab support.
It is organized into specialized "Studios," each designed to handle a specific phase of scientific documentation and production.
## How I built it
I architected the application using React 19 and TypeScript for a robust, type-safe frontend. The core engine is powered by the Google GenAI SDK, utilizing a tiered model approach:
- **Reasoning Layer:** Gemini 3 Pro handles complex JSON schema generation for storyboarding.
- **Vision & Video Layer:** A sequence-dependent pipeline where static images from Gemini are fed into Veo for motion synthesis.
- **Live Layer:** Real-time audio streaming using the Gemini 2.5 Flash Native Audio model via WebSockets for low-latency interaction.
- **Persistence:** A hybrid approach using Supabase for authentication and local storage buffers for project state.
## Challenges I ran into
- **Multimodal Synchronization:** Coordinating the timing between the generated video duration and the TTS audio narration required precise state management.
- **State Persistence:** Ensuring that large Base64 assets (images/audio) didn't crash the browser's storage while maintaining an "autosave" feel.
- **Scientific Accuracy:** Fine-tuning prompts to ensure Gemini didn't "hallucinate" glassware or chemical reactions, achieved through rigorous system instructions and grounding.
## Accomplishments that I am proud of
- **Jeona Assistant:** Successfully implementing the Gemini Live API to create a zero-latency voice assistant that understands lab safety and stoichiometry.
- **Veo Integration:** Being among the first to implement a complete text-to-image-to-video workflow using the veo-3.1 models.
- **Economical UX:** Implementing the "Bring Your Own Key" (BYOK) model to empower researchers to manage their own costs while using high-tier models.
## What I learned
I learned for the first time some concepts and implemetation, e.g Thinking Config for complex reasoning tasks. I learned that for scientific applications, providing a (e.g., 32,768 tokens) is critical for Gemini to correctly calculate molarities and procedural steps before generating the final output.
## What's next for LabGen Studio
- **V6: Cloud Scale:** Migrating to a full PostgreSQL backend on Supabase for global project synchronization.
- **V7: Improve the Educator Ecosystem:** The Studio for Automated integration with Google Classroom and Forms, converting lab protocols into interactive MCQ quizzes.
- **V8: Senior Researcher Studio:** Developing autonomous AI agents on Vertex AI to assist in experimental design and predictive analysis.
- **The Robotics Frontier:** My ultimate vision is to export these visual instructions as training data for Vision-Language-Action (VLA) models, enabling lab robots to perform these protocols physically.
Log in or sign up for Devpost to join the conversation.