Frame from Quantum computing video
Frame from Graph Theory Explanation

AI Video Generator: From Concept to Creation

Inspiration

Have you ever wondered in class, "When am I really going to use these concepts in the real world?" To answer this question our team set out to present a video model of real life applications of concepts presented from a prompt. Our team was also inspired by the rapid advancements in AI technology, particularly in the fields of natural language processing and video generation. We saw an opportunity to bridge these technologies, creating a tool that could transform simple text prompts into engaging video content. The idea of democratizing video creation, making it accessible to anyone with an idea, regardless of their technical or creative skills, was a driving force behind our project.

Program Flow

User inputs topic they want to learn about in Next.js site
Query is sent to flask backend
LLM converts topic to a narative
LLM converts narrative into screenplay (with scenes and scene descriptions)
Cloud hosted Open-Sora generates 4 second clip for each scene
Text to Speech generates audio of narrative
Scenes and text to speech are combined and returned to front end
Video is displayed to the user

What We Learned

Throughout the development of this project, we gained valuable insights and skills:

AI Integration: We deepened our understanding of OpenAI's GPT models and how to effectively prompt them for creative tasks.
Full-Stack Development: We honed our skills in building a cohesive application that spans from frontend to backend, integrating various technologies.
API Design: We learned the intricacies of designing and implementing RESTful APIs that connect our frontend with our AI-powered backend.
React and Next.js: We expanded our knowledge of modern frontend frameworks, particularly in creating dynamic and responsive user interfaces.
Python Backend: We improved our Python skills, especially in handling asynchronous operations and subprocess management.
Implement Open-Sora : Compiled and hosted Open-Sora ourselves
Use OpenAI TTS : First time using it

How We Built It

Our project was built using a combination of cutting-edge technologies:

Frontend: We used React with Next.js for a fast, SEO-friendly frontend. Material-UI provided a sleek, responsive design.
Backend: Python powered our backend, with Flask serving as our web framework.
AI Integration: We leveraged OpenAI's GPT-4 model to generate creative scripts and screenplays from user prompts.
API: We created a custom API using Next.js API routes to handle communication between the frontend and backend.
Video Generation: We generated videos using Open-Sora to present an understandable model to explain our prompt

Challenges We Faced

Despite our enthusiasm, we encountered several challenges:

CORS Issues: We struggled with Cross-Origin Resource Sharing (CORS) when trying to connect our frontend to the backend API. This required us to implement workarounds and properly configure our server.
AI Model Limitations: Working within the constraints of the AI model's output format and ensuring consistent, usable results was a significant challenge.
Asynchronous Operations: Managing asynchronous operations, especially when dealing with AI-generated content and video processing, proved to be complex.
Performance Optimization: Ensuring the application remained responsive while handling resource-intensive tasks like video generation was a constant concern.
Error Handling: Implementing robust error handling across both frontend and backend to provide a smooth user experience was more challenging than anticipated.
Integration Complexity: Bringing together various technologies (React, Next.js, Python, OpenAI API) into a cohesive application required careful planning and execution.

Despite these challenges, our team persevered, learning valuable lessons along the way. The result is an innovative application that pushes the boundaries of what's possible with AI-assisted content creation.