Inspiration

The inspiration for CineScan came from a personal pain point: the endless scroll of Instagram Reels and YouTube Shorts. I frequently found myself watching high-quality movie recommendations but losing them because the process of manually switching to IMDb or Letterboxd to add a title was too high-friction. I wanted to create a bridge between the moment of discovery and the act of organizing my cinematic "to-watch" list, using AI to "watch" the content with me.

What it does

CineScan is a Chrome extension that acts as a multimodal companion for social media video players. When a user clicks the "Scan" button on a YouTube Short or Instagram Reel, the extension captures the video context and sends it to Gemini 3 Flash. The AI analyzes the video frames, audio, and text overlays to identify movie or TV show titles. These titles are then presented in a clean, interactive checklist where users can instantly confirm them for their watchlist.

How I built it

The project followed a "vibe coding" workflow, utilizing Antigravity as the primary development environment to scaffold the Manifest V3 architecture.Frontend: Built with modular JavaScript and CSS, injecting a custom UI overlay onto existing social media DOM structures.Backend Logic: We integrated the Gemini 3 Flash API, specifically leveraging its multimodal capabilities to process video data.Prompt Engineering: We used Google AI Studio to refine a system prompt that enforces a structured JSON response, ensuring our extension could parse data with high reliability.Mathematically, we viewed the confidence of an extraction as a probability $P(T|V, A)$, where $T$ is the title, $V$ represents the visual features (posters/text), and $A$ represents the audio features (narrator mentions). Gemini 3 Flash optimizes this inference:$$P(T|V, A) = \frac{P(V, A|T)P(T)}{P(V, A)}$$

Challenges I ran into

The journey wasn't without its "boss fights."Billing Hurdles: As a student in India, navigating the Google Cloud billing setup—specifically the RBI-compliant mandates and the GSTIN requirements—was a significant administrative challenge.API Rate Limits: During heavy testing, we frequently hit 429 Resource Exhausted errors. We had to implement an intelligent retry-with-backoff mechanism in our background script to handle these gracefully.Video Constraints: Fitting our entire technical story and a live demo into a strict 2-minute video limit forced us to be ruthless with our editing and narrative pacing.

Accomplishments that I'm proud of

We are incredibly proud of achieving an end-to-end multimodal loop that feels "magical". Seeing the extension accurately identify Interstellar from a 3-second aesthetic clip—where the title was never mentioned in the audio or caption—proved the power of Gemini 3’s reasoning. We are also proud of the professional-grade UI we managed to build within the hackathon's tight timeframe.

What I learned

This project was a masterclass in the practicalities of multimodal AI.We learned that Gemini 3 Flash is exceptionally efficient for high-throughput video tasks compared to larger models.We gained hands-on experience with Manifest V3 security best practices, such as delegating sensitive API calls to background service workers.We also learned the importance of "fail-soft" UI design—how to keep the user engaged even when the AI is taking a few seconds to process a complex video.

What's next for CineScan

The prototype is just the beginning.Platform Expansion: We plan to bring CineScan to TikTok and streaming platforms like Netflix to provide a unified discovery layer.Deep Integration: Our next major milestone is implementing full OAuth2 integration with Letterboxd and IMDb to allow for actual, one-click database entries.Social Discovery: We envision a "Scan History" feature where users can see what their friends have been scanning and discovering through the extension.

Built With

Share this project:

Updates