Youtube Trendy Analyst V1

The Inspiration

As video content production continues to scale, creators face a critical bottleneck: the traditional tools used to optimize for the YouTube algorithm rely solely on shallow text tags, flat titles, and human guesswork. They cannot actually see or hear what is happening inside the footage. This disconnect leads to unexpected viewer drop-offs, poor Click-Through Rates (CTR), and wasted production hours.

We built Youtube_Trendy_Analyst_V1 to solve this problem by leveraging native multimodal AI. Our tool scans the raw visual and audio tracks of a video file to automatically map out performance data, score platform virality, and pinpoint exact timestamps where viewers are at risk of dropping off due to cognitive overload or pacing issues.

How We Built It

The platform is built around a lightweight, local-first architecture designed to process heavy video payloads smoothly without performance lag:

The Backend: A robust, asynchronous FastAPI server that handles multi-gigabit file uploads using optimized temporary filesystem buffers.
The Intelligence Core: Powered by the official google-genai SDK, we pass the raw video file directly into the gemini-2.5-flash model. By using strict Pydantic object schemas and keeping model configuration temperatures low ($temperature = 0.1$), we enforce deterministic, structured JSON delivery.
Trend Synthesis: The backend concurrent workers tap into the Serper.dev API to pull real-time organic search trends, giving the AI a live platform benchmark to evaluate the video against.
The Frontend: A responsive, clean HTML5/CSS3 dashboard that coordinates the media file data streaming via native JavaScript Fetch and FormData layers.

Challenges Faced

Our biggest hurdle was preventing data execution race conditions and eliminating model hallucinations. Initially, the model would guess the video topic incorrectly or hit a timeout block if the prompt executed before the multi-gigabit video packets had stabilized on the server.

We solved this by re-engineering the pipeline with an asynchronous check-state loop (while processing) using 5-second interval back-offs. Additionally, we implemented rigid token isolation boundaries ([START OF USER CONTENT CONTEXT]) inside our prompt structure to strictly shield the multimodal core from prompt injection attempts and conflicting search trend data noise.

What We Learned

This project deepened our understanding of building non-blocking asynchronous file pipelines and managing structural telemetry outputs using multimodal LLMs. We learned that isolating user input text blocks from core model instructions is vital for maintaining data schema integrity, ensuring a smooth, crash-free interface deployment for production environments.

Built With

Updates

Treatable MAK started this project — May 24, 2026 12:25 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.