Inspiration
We set out to bridge the gap between static manga panels and animated storytelling. While anime production is expensive and time-consuming, many creators and fans dream of seeing their work come to life. With recent advances in multimodal AI, we saw the opportunity to build a tool that could bring manga to motion instantly.
What it does
Animefy.io transforms manga images into short anime-style video clips. The platform automatically understands the visual content of a manga panel, generates a descriptive scene, and then uses generative models to animate the prompt. The result is a seamless pipeline that brings drawings to life with minimal user input.
How we built it
Backend: • Built with Flask and served through NGROK for testing • Uses GLM-4V (Vision Language Model) to interpret manga panels into text descriptions • Uses a text-to-video diffusion model (cerspense/zeroscope_v2_576w) to generate animated video frames • Converts video frames into a downloadable MP4 file
Frontend: • Built with React and styled using Tailwind CSS • Allows users to upload manga panels, view results, and download videos • Clean and responsive UI for both desktop and mobile
Pipeline: 1. Manga panel image is uploaded 2. GLM-4V generates a scene prompt 3. The prompt is passed into the diffusion model 4. Generated frames are exported as a video and returned to the user
Challenges we ran into
• Integrating two large models with distinct modalities (image-to-text and text-to-video)
• Managing VRAM and generation time in a Colab-based backend
• Ensuring high-quality video output with limited resolution and frame count
• Handling serialization and dependency issues with diffusers and PyTorch models
• Creating a smooth developer experience across Flask and React
Accomplishments that we're proud of
• Built a fully functional end-to-end system that converts manga panels into animated video
• Integrated GLM-4V and ZeroScope into a real-time generation pipeline
• Created a usable and attractive frontend with full upload-to-video delivery
• Successfully generated compelling visual content from static input in under a minute
What we learned
• Deepened understanding of multimodal pipelines combining vision, language, and diffusion
• Learned to fine-tune prompts for optimal video generation results
• Gained practical experience in handling large models on constrained environments
• Improved full-stack deployment skills with Flask, React, and NGROK
What's next for Animefy.io
• Add audio generation, including voice and background music
• Support multi-panel sequences for richer storytelling
• Enable user-editable prompts and character customization
• Deploy on GPU-backed cloud infrastructure for public access
• Add login, saved scenes, and community sharing features
Log in or sign up for Devpost to join the conversation.