Inspiration

We were inspired by the struggles that many aspiring YouTubers face: the time-consuming research, the creative blocks, and the overwhelming complexity of video creation. We saw talented individuals with great ideas getting bogged down in the process, preventing them from sharing their content with the world.

What it does

Our Multimodel Agent helps YouTubers create video ideas and plan them. It analyzes the user's existing videos, identifies the user's niche, and analyzes trending videos in the niche. Based on this analysis, it automatically generates a compelling video script, complete with scene descriptions and dialogue suggestions. It also creates thumbnail inspirations that are designed to grab attention and increase click-through rates. Essentially, it's a virtual assistant for YouTube content creation, enabling users to create high-quality videos faster and more efficiently.

How we built it

We used React for a responsive user interface and Flask (Python) for our backend API. Google Deepmind (Gemini API): Gemini handles both prompt engineering (creating effective prompts for the other AI) and the heavy lifting of script generation. We feed it user niche data and trend info to create compelling scripts.

Together AI: Generates a range of thumbnail options based on the video topic, designed to maximize click-through rates.

Agno: Identifies the user's content niche, allowing us to identify trending videos in the niche and tailor scripts and thumbnails to the appropriate audience.

Data from YouTube: We used the YouTube API to gather data on trending videos and user videos, which informs our AI models.

The workflow is: User Input (React) -> Niche Analysis (Agno) -> Script and Thumbnail Generation (Gemini & Together AI) -> Output (React).

Challenges we ran into

  1. Integrating Multiple AI Models – Combining Agno for keyword extraction, Gemini AI for script and thumbnail generation, and Together.AI for image rendering required careful API coordination. Ensuring smooth data flow between these models was a challenge.
  2. Rendering Markdown in React – Displaying AI-generated scripts properly required implementing marked.js to support markdown formatting while maintaining security with dangerouslySetInnerHTML.
  3. Tailwind CSS Issues – Initially, Tailwind wouldn't install and configure properly due to dependency conflicts, leading to the use of custom CSS for styling.
  4. Base64 Image Handling – The AI-generated thumbnails were returned as base64 strings, which required proper decoding and rendering in React while maintaining performance.
  5. Frontend & Backend Communication – Making sure FastAPI correctly processed requests from the React frontend, especially handling JSON payloads properly and ensuring CORS settings were configured correctly.

Accomplishments that we're proud of

✅ Seamless AI Integration – Successfully integrated multiple AI services (Agno, Gemini, Together.AI) to generate a full video content package in one click.
✅ Automated Content Creation – Users can generate a script and thumbnail in seconds using only a YouTube channel ID.
✅ Minimalist & Intuitive UI – The frontend was designed to be clean, user-friendly, and visually appealing, ensuring a smooth user experience.
✅ FastAPI Backend – Built a high-performance backend that efficiently processes YouTube data and AI-generated content.
✅ Scalable Architecture – Structured the project in a way that allows for future expansion, including additional AI-driven features.

What we learned

  1. Multi-Model AI Integration – Working with different AI services and ensuring they work together to create a seamless experience.
  2. Handling API Responses in React – Learned how to efficiently process and display JSON data from FastAPI.
  3. Styling Without Tailwind – Despite Tailwind installation issues, we adapted by writing custom CSS for a modern design.
  4. Optimizing Image Rendering – Learned how to efficiently process and display base64-encoded images in React without performance issues.
  5. Debugging Cross-Origin Requests – Gained experience troubleshooting CORS issues between the FastAPI backend and React frontend.

What's next for MediaMind

🚀 Expand Platform Support – Extend support beyond YouTube to analyze content from Instagram, TikTok, and Twitter.
🎨 More Thumbnail Customization – Allow users to tweak AI-generated thumbnails with color schemes, fonts, and overlays.
📝 Enhanced Script Writing – Implement multi-tone and voice adaptation, letting users customize the AI’s writing style (formal, humorous, engaging, etc.).
📊 Trend Analysis – Add insights on trending video topics and hashtags to help creators plan better content.
📡 Real-Time AI Coaching – Integrate AI-based feedback on content strategy, suggesting video topics based on user engagement trends.
💡 Deployment – Set up a production-ready version with cloud hosting for both the backend and frontend.

Built With

Share this project:

Updates