Inspiration
Content creation is an asymmetric war. Top creators like MrBeast spend upwards of $10,000 on a single thumbnail because they understand one truth: CTR (Click-Through Rate) is everything. If nobody clicks, the video dies. We wanted to democratize this power. We asked ourselves: Can we condense a team of expert photographers, psychologists, and Photoshop artists into a single AI agent? The goal was to give every creator access to a "viral engine" that doesn't just generate images, but engineers attention.
What it does
Beastify.ai is a comprehensive viral engineering suite powered by Google Gemini. It doesn't just make pictures; it understands YouTube psychology. Viral Generation: Creates high-contrast, high-emotion thumbnails using the "MrBeast Formula" (Rule of Thirds, extreme expressions, vivid lighting). Surgical Editing: Users can swap faces with famous personas or their own, and use in-painting to add objects while maintaining lighting consistency. The War Room (Analysis): Uses Gemini 3 Pro as "The General" to audit thumbnails. It provides a ruthless, data-driven critique, scoring images on eye contact, emotion, and storytelling. Strategic Audio Briefings: Converts complex analysis into a short, authoritative audio briefing using Gemini 2.5 Flash TTS, acting as a personal strategist. Viral Titles: Generates title variations optimized for shock, curiosity, and greed.
How we built it
We built a modern, responsive web application using React and Tailwind CSS, deeply integrated with the Google GenAI SDK. Visuals: We utilized gemini-2.5-flash-image for lightning-fast text-to-image generation and complex image-to-image transformations (Face Swaps). Intelligence: We deployed gemini-3-pro-preview for the "War Room" logic. We used complex system instructions to force the model to act as a "Ruthless General," analyzing JSON schemas to output structured scores and tactical advice. Audio: We implemented gemini-2.5-flash-preview-tts to generate raw PCM audio data, which we decode directly in the browser using the Web Audio API for low-latency playback. State Management: We built a custom history and comparison slider to let users A/B test their creations against original inputs.
Challenges we ran into
Prompt Engineering for Consistency: Getting the AI to consistently follow the "Rule of Thirds" and maintain specific facial expressions (like the "MrBeast Scream") required iterative refinement of our system prompts and "Agent Workflow" logic. Raw Audio Decoding: The Gemini TTS API returns raw PCM data, not a standard MP3 file. We had to write a custom audio buffer decoder to play the voice of "The General" seamlessly in the browser. Context Retention: Ensuring the AI remembered the "Persona" across different edits and generations was tricky, requiring us to manage the context window effectively.
Accomplishments that we're proud of
The "War Room" Logic: We are incredibly proud of the analysis engine. It doesn't just describe the image; it gives strategic advice (e.g., "Fix the lighting on the left," "Force eye contact"). Seamless Face Swapping: achieving a commercial-grade face swap that matches skin tone and lighting conditions within seconds. The UI/UX: Creating a "Dark Mode" interface that feels like professional software (Sony A7S III aesthetic) rather than just a simple form.
What we learned
We learned that Multimodal AI is the future of creative tools. It's not enough to just generate text or images separately. By combining Vision (analyzing the user's upload), Reasoning (critiquing it), and Audio (speaking the advice), we created an experience that feels like working with a real human expert. We also learned the immense power of Gemini 1.5 Pro/Flash in handling large context windows for complex creative tasks.
What's next for Beastify
Beastify is just the beginning of the "Viral Engineering" era. Our roadmap for 2026 is focused on closing the loop between creation and distribution: The Retention Engine (Video): We plan to integrate Google Veo to generate high-retention video intros (the first 5 seconds) that perfectly match the promise of the generated thumbnail. Real-Time A/B Simulation: Instead of just LLM reasoning, we aim to train a custom Vision model on 10 million high-performing YouTube thumbnails to predict CTR with statistical probability before you upload. Direct YouTube Integration: A "One-Click Deploy" feature that uploads the thumbnail and runs live A/B testing on the user's channel via the YouTube API. Beastify Mobile: Putting "The General" in every creator's pocket, allowing for real-time thumbnail audits by simply pointing the phone camera at a screen.
Built With
- gemini-2.5-flash
- gemini-2.5-flash-tts
- gemini-3-pro
- google-gemini-api
- google-genai-sdk
- react
- tailwind-css
- typescript
- youtube-data-api
Log in or sign up for Devpost to join the conversation.