Inspiration

We’ve all been there: scrolling through social media, seeing a clip, or hearing a quote that sparks a vague memory, but we just can't place the movie. Comments sections are often unhelpful or full of false leads. Inspired by how Shazam revolutionized music discovery, we set out to solve the "What movie is that?" problem using advanced multimodal AI. We wanted to build a tool that didn't just recognize titles, but understood context - able to identify a film from a blurry screenshot, a specific scene description, or even a fragmented line of dialogue.

What it does

Cazem is an intelligent movie identification platform that acts as your cinematic detective.

Visual Recognition: Upload a screenshot or use your device's camera to snap a photo of a movie scene. Cazem analyzes the visual elements—actors, setting, costumes, and cinematography—to pinpoint the exact film. Dialogue Matching: Remember a quote but not the movie? Type it in, and Cazem identifies the source, even if the quote is slightly misremembered. Rich Metadata: Once identified, it instantly pulls up high-quality posters, backdrops, release years, cast lists, and director info. Smart Recommendations: It doesn't just stop at identification; it understands the vibe of the movie you found and suggests similar films you might enjoy.

How we built it

We built Cazem as a modern, high-performance web application:

Frontend: Developed with React 19 and Vite for a blazing-fast, responsive user experience. Styling: Utilized Tailwind CSS v4 to create a premium, "Cinematic" dark-mode aesthetic with glassmorphism effects and smooth micro-interactions. The Brain (Gemini 3.0 Integration): We leveraged Google's Gemini 3.0 models for their superior multimodal capabilities. Vision Analysis: We feed raw image data (via react-webcam or uploads) directly to Gemini 3.0's vision capabilities. The model doesn't just "match pixels"—it reasoning about the scene composition, lighting, and actors to deduce the film with high accuracy, even from non-standard frames. Contextual Reasoning: For dialogue and text queries, Gemini 3.0's vast context window allows it to disambiguate common phrases and pinpoint specific movie scripts.

Challenges we ran into

API Rate Limiting: One of our biggest headaches was that uploading a scene wasn't working initially because each time we uploaded it, the API limit would be reached immediately. We had to implement efficient error handling and optimization to make it work within the quotas.

Hallucinations vs. Reality: Early on, AI would sometimes confidently guess the wrong movie for generic scenes (e.g., "two people in a coffee shop"). We had to refine our system prompts to encourage the model to analyze specific details (background props, lighting style) and cross-reference with known film data.

Real-time Camera Integration: Handling video streams and capturing high-quality stills in the browser across different devices required careful optimization of react-webcam and canvas rendering.

Accomplishments that we're proud of

Seamless "Snap-to-Result" Flow: The application feels instant. From taking a photo to seeing the movie poster takes just seconds. High Accuracy with Gemini 3.0: The upgrade to the latest Gemini models significantly improved our hit rate with obscure or older films compared to previous tests. Premium UI/UX: We avoided the "bootstrapped" look. Cazem looks and feels like a production-ready app, with smooth transitions, loading states, and a cohesive design system.

What we learned

Multimodal is Powerful: Treating images and text as a single fluid input stream for the AI opened up possibilities we didn't expect (e.g., describing a scene and providing a photo works better than either alone). User Trust is Key: Showing why a movie was identified (e.g., "Recognized the specific cafe scene from Inception") builds much more trust than just showing a title. Frontend Performance Matters: AI apps often feel slow due to processing time. Masking that latency with engaging UI animations keeps users happy.

What's next for Cazem

"Scene-Specific" Details: Using Gemini's long context window to identify not just the movie, but the exact timestamp or chapter of the scene.

Watch List Integration: A "Save for Later" feature to build your watchlist directly from your discoveries. Social Challenges: "Guess the Movie" daily challenges generated by the AI from its vast knowledge base.

Optimizing Scene Upload Pipeline: We are prioritizing a complete refactor of the scene upload feature to ensure 100% reliability, specifically addressing the API limit edge cases to make image analysis seamless for every user.

Built With

Share this project:

Updates