The Mitate Story: Making Research Accessible Through AI-Generated Visuals

Inspiration

The academic research world has a paradox: groundbreaking discoveries are published every day on platforms like arXiv, yet most of these papers remain inaccessible to anyone outside a narrow circle of domain experts. Dense mathematical notation, specialized jargon, and complex concepts create barriers that prevent students, professionals in adjacent fields, and curious learners from engaging with cutting-edge research.

We've all experienced that moment of excitement when stumbling upon an interesting paper title, only to be met with impenetrable walls of technical language. What if AI could bridge this gap? What if we could automatically transform these complex papers into beautiful, knowledge-level-appropriate visual explainers that make research accessible to everyone?

That's where Mitate was born—from the vision of democratizing research knowledge through AI-powered visual storytelling.

Our commitment to open source: From the start, we wanted Mitate to be built on open source models. We believe in democratizing access to AI—not just the end applications, but the underlying technology itself. By leveraging open source LLMs and image generation models like FIBO, we're making cutting-edge research accessible using tools that are themselves accessible. This alignment of values—open research explained through open AI—felt essential to our mission.

What It Does

Mitate is an AI-powered research paper visual explainer that transforms complex arXiv papers into beautiful, educational visualizations tailored to your knowledge level.

Here's how it works:

Paper Discovery: Users search by topic or paste an arXiv URL
AI Summarization: DigitalOcean's Gradient AI (llama3.3-70b-instruct) analyzes the paper and creates knowledge-level-appropriate summaries (beginner, intermediate, or advanced)
Visual Generation: FIBO/Bria AI generates professional illustrations based on the structured summary
Adaptive Content: Both the text explanations and visual metaphors adapt to the user's expertise level

Mitate supports two visualization modes:

Infographic Mode: A single comprehensive image with text overlays explaining all key concepts
Simple Visuals Mode: 3-7 carousel images with pure visual metaphors, where text appears in the UI below each image

The result? Research papers that used to take hours to decipher can now be understood in minutes through engaging visual stories.

How We Built It

The Journey from Idea to Production

Our journey began in mid-December 2025, with a simple goal: make research papers accessible. Here's how we brought Mitate to life over an intense few days of development.

Phase 1: Foundation and Architecture

We started with the fundamentals, building a solid foundation:

Frontend Setup: Built a React SPA using TanStack Router for modern, type-safe routing
Backend Infrastructure: Chose Appwrite Functions for serverless architecture—no servers to manage, infinite scalability
Database Design: Created a clean schema with requests and results collections to track generation workflows
Initial Testing: Developed offline test scripts to validate our AI integration approach before deploying

Phase 2: AI Integration and Open Source Models

The real challenge began with AI service integration. We needed two distinct capabilities:

Text Summarization: Converting dense academic papers into digestible summaries
Image Generation: Creating professional visual representations

The Open Source LLM Journey: We committed early to using open source models. This led us to DigitalOcean's Gradient AI Platform, which hosts powerful open source models like llama3.3-70b-instruct. However, we learned an important lesson about collaborating with AI coding assistants: be specific in your requirements.

When we initially outlined our vision to our AI coding assistants, we weren't explicit enough about which AI providers to use. The assistants made assumptions and started implementing integrations with different vendors than we intended. This taught us that when working with AI coding tools, clarity in specifications is crucial—vague requirements lead to misaligned implementations. Once we clarified our commitment to open source models hosted on DigitalOcean's platform, the implementation aligned perfectly with our vision.

FIBO Integration - A Game Changer: For image generation, we chose FIBO/Bria AI, and it proved to be the perfect foundation for Mitate. What sets FIBO apart is its structured prompt system—instead of relying on free-form text descriptions, FIBO accepts detailed JSON specifications that define layouts, objects, text layers, color schemes, and aesthetic parameters.

This structured approach was transformative for our use case. We could programmatically encode:

Knowledge-level specific visual styles: Different color palettes and complexity levels for beginner vs. advanced users
Extracted concepts from papers: Map each key concept to specific visual objects and metaphors
Precise layout control: Position elements consistently across generations
Aesthetic scoring: Fine-tune the professional quality of outputs

FIBO's flexibility allowed us to turn academic concepts into visual language algorithmically. Rather than hoping a text prompt would work, we could engineer exact specifications that reliably produced high-quality educational visuals. This level of control and predictability is exactly what production applications need.

Why DigitalOcean's Platform? Beyond hosting open source models, DigitalOcean democratizes access to cutting-edge AI. Their platform makes powerful models like Llama 3.3 available to developers worldwide—not just those with massive infrastructure budgets. This resonated deeply with our mission: using democratized AI to democratize research knowledge.

Phase 3: Embracing FIBO's Strengths

As we tested image generation, we discovered an important insight: text rendering in diffusion models is challenging. While FIBO's structured prompts gave us unprecedented control over visual elements, we realized we could achieve even better results by leaning into FIBO's core strength—generating stunning visual imagery—rather than asking it to also handle text rendering.

This insight led to a critical architectural decision. We developed a dual-mode approach:

Infographic Mode: Comprehensive single-image visualizations with text overlays
Simple Visuals Mode: Pure visual metaphors where FIBO shines brightest—text-free icon-style images with explanations rendered in the UI

The Simple Visuals mode became our flagship feature. By generating 3-7 separate concept images, each focusing purely on visual storytelling, we unleashed FIBO's true potential. Each image became a carefully crafted visual metaphor, with FIBO's structured prompt system allowing us to specify exact aesthetic qualities, object compositions, and visual styles that matched the complexity level of the content.

Phase 4: Deployment and Function Architecture

Deploying to Appwrite Functions taught us valuable lessons about serverless architecture:

Function Separation: We split into generate-poster (entry point API) and process-generation (background worker) for better scalability
Timeout Handling: Long AI processing jobs required careful timeout management and status tracking
Resource Allocation: Upgraded to 1vCPU/1GB RAM for AI workloads

Phase 5: Carousel Revolution - Unleashing FIBO's Full Potential

A breakthrough emerged with the multi-image carousel mode. Instead of cramming everything into one image, we generated 3-7 separate concept images displayed in an interactive carousel. Each image became a pure visual metaphor—no text, just evocative imagery—with explanations rendered cleanly in the UI below.

This approach let us fully leverage FIBO's sophisticated structured prompting:

Per-concept visual optimization: Each concept got its own dedicated FIBO generation with tailored aesthetics
Dynamic visual adaptation: We could adjust FIBO's object specifications, color schemes, and composition parameters based on knowledge level
Consistent quality: FIBO's JSON-based interface meant we could reliably generate professional-grade visuals for each concept
Engaging user experience: Users could pace their learning, exploring one concept at a time

The carousel mode transformed Mitate from a simple infographic generator into a sophisticated visual storytelling platform, all powered by FIBO's remarkable flexibility and control.

Phase 6: Polish and Documentation

The final phase focused on production readiness. We cleaned up our codebase, moved experimental code to a dedicated directory for reference, and documented our learnings. We also increased DigitalOcean's token limit to 4000 tokens to handle longer papers.

The Tech Stack

Frontend: React 19 with TanStack Router (type-safe routing), Tailwind CSS (modern styling), shadcn/ui components
Backend: Appwrite Functions (serverless Node.js), Appwrite Database (NoSQL storage)
AI Services:
- DigitalOcean Gradient AI (Llama 3.3-70B-Instruct open source model) for intelligent summarization
- FIBO/Bria AI (open source diffusion model) for professional image generation
Deployment: Appwrite Static Sites for frontend, Appwrite Functions for backend

Our entire AI pipeline runs on open source models, making the technology stack as accessible as the research we're explaining.

Challenges We Ran Into

1. Optimizing for FIBO's Strengths

Our initial approach tried to do everything in one image—visuals and text overlays. While FIBO's structured prompting gave us incredible control, we realized we could achieve even better results by focusing on what FIBO excels at: pure visual generation.

Solution: We embraced FIBO's strengths. Simple Visuals mode generates text-free visual metaphors, rendering all text in the UI. This architectural choice let us maximize FIBO's capabilities—each image became a showcase of FIBO's ability to create compelling, concept-specific visuals through its structured JSON prompt system. The result? Cleaner visuals, better UX, and more reliable generation quality.

2. Serverless Timeout Management

AI processing can take 30-60 seconds per paper. Appwrite Functions have execution time limits, and API requests can't wait that long.

Solution: Asynchronous architecture. The generate-poster function immediately returns a request ID, while process-generation runs in the background. Users poll for status updates—a pattern familiar from modern web apps.

3. Knowledge Level Adaptation

How do you make an AI explain quantum mechanics to a beginner versus an advanced physicist? The same content needs completely different language, analogies, and visual metaphors.

Solution: Comprehensive prompting. We engineered detailed system prompts for DigitalOcean's Llama 3.3 model that include examples for each knowledge level. For beginners: everyday metaphors ("Think of transformers like a team of translators..."). For advanced: technical precision ("Multi-head self-attention with scaled dot-product..."). The open source nature of Llama 3.3 gave us confidence that this capability would remain accessible and improve over time.

4. Token Limits and Paper Length

Some research papers are long. Very long. We initially hit token limits trying to process 50-page papers.

Solution: Incremental increases (now at 4000 tokens) and graceful degradation. If a paper exceeds limits, we generate a basic summary rather than failing completely. Future enhancement: chunking strategies for very long papers.

5. Handling Peak Load

During peak usage, we occasionally encountered timeouts—a natural challenge when generating high-quality images on demand.

Solution: Retry logic and graceful fallback patterns. We track generation attempts and handle edge cases elegantly, ensuring users always get informative feedback. FIBO's consistent API responses made implementing this error handling straightforward.

Accomplishments That We're Proud Of

1. Production-Ready in Under a Week

From first commit to production deployment in just a few days. We moved quickly through ideation, implementation, and deployment—proving that modern serverless and AI infrastructure enables rapid iteration.

2. Innovative Dual-Mode Architecture

Rather than settling on a single approach, we created two complementary modes that showcase different aspects of FIBO's capabilities. Users can choose based on their preferences, and we can toggle via environment variable. The Simple Visuals mode particularly highlights FIBO's structured prompting system—each concept gets its own carefully engineered JSON specification that produces consistently stunning results.

3. Intelligent Knowledge Adaptation Powered by FIBO

The AI doesn't just summarize—it adapts. Same paper, three completely different experiences based on user expertise. This is where FIBO's structured prompting truly shines: we can programmatically adjust color palettes, visual complexity, object selections, and aesthetic parameters through FIBO's JSON interface. A beginner might get bright, simple visual metaphors, while an advanced user gets more sophisticated, layered compositions—all controlled through FIBO's flexible specification system.

4. Clean, Maintainable Codebase

Despite rapid development, we maintained code quality:

TypeScript throughout for type safety
Modular service architecture
Comprehensive documentation
Preserved experimental code for future reference

5. Serverless-First Design

No servers to manage, automatic scaling, pay-per-use pricing. Appwrite Functions proved perfect for this use case—we can handle one request or one thousand without infrastructure changes.

6. 100% Open Source AI Stack with Production-Grade Tools

We're proud to have built Mitate entirely on open source AI models. From Llama 3.3 for text summarization to FIBO for image generation, every AI component is open source. FIBO proved that open source image generation models can be production-ready—its structured JSON prompt system offers the kind of precision and reliability that professional applications demand. This means Mitate's core capabilities aren't locked behind proprietary APIs—they're built on technology that's accessible to developers worldwide. As these open source models improve, Mitate improves with them.

What We Learned

Technical Insights

FIBO's Structured Prompting is a Game Changer: FIBO's JSON-based structured prompt system was the key to Mitate's success. Instead of hoping free-form text prompts would produce consistent results, we could engineer exact specifications—layouts, objects, colors, aesthetic scores—that reliably generated professional visuals. This programmatic control is exactly what production applications need. We could map academic concepts to visual metaphors algorithmically, adjust complexity based on user knowledge level, and produce consistently high-quality outputs. FIBO proved that structured prompting is the future of production image generation.
Serverless Patterns for AI Workloads: Long-running AI tasks need asynchronous patterns, status tracking, and graceful timeout handling. Synchronous request-response doesn't work at AI timescales.
Visual Specialization Beats One-Size-Fits-All: By using FIBO to generate 3-7 separate concept images (carousel mode) rather than one complex infographic, we achieved better results. Each FIBO generation could be optimized for a single concept, with tailored JSON specifications that produced more focused, higher-quality visuals.
Token Economics Matter: LLM token limits significantly impact product design. We had to balance paper length, summary depth, and API costs.
Open Source Models Are Production-Ready: Llama 3.3 and FIBO aren't experimental—they're powerful enough to build real products. FIBO's reliability, consistent API, and structured interface made it feel more dependable than many proprietary alternatives. The gap between proprietary and open source AI is narrowing fast.

Design Philosophy

Play to Your Tools' Strengths: Rather than fighting limitations, we leaned into what FIBO does brilliantly—pure visual generation. This led to Simple Visuals mode, which showcases FIBO's capabilities beautifully and delivers a better user experience.
Progressive Enhancement: Start with core functionality (single infographic), then layer on improvements (carousel, multi-image, knowledge adaptation). Each iteration revealed new possibilities in FIBO's structured prompting system.
Document as You Go: Our documentation evolved alongside the code. Future us (and contributors) will thank present us.
Test Offline First: The testWithoutAPIs.ts scripts let us validate logic before burning API credits on bad prompts. FIBO's consistent API behavior meant our offline tests translated reliably to production.

Collaborating with AI Coding Assistants

Lesson: Specificity is critical. When we weren't explicit about which AI providers to use, our coding assistants made different assumptions and started building integrations we didn't intend. Once we learned to be precise in our specifications—not just "use an LLM" but "use DigitalOcean's Gradient AI with Llama 3.3"—the collaboration became far more effective.

This taught us that AI coding assistants are powerful collaborators, but they need clear requirements. Treat them like you would a junior developer: the more specific your instructions, the better the output.

Research Accessibility

Working with arXiv papers gave us deep appreciation for the accessibility gap in academic research. Even intermediate-level explanations can make cutting-edge research approachable to motivated learners. The fact that we could build this accessibility tool using open source AI felt particularly meaningful—democratizing research with democratized technology.

What's Next for Mitate

Near-Term Enhancements

User Authentication and History: Let users save their generated explainers and build personal research libraries
Multiple Visual Styles: Beyond professional infographics—sketch style, comic book style, minimalist diagrams
PDF Export: Download explainers for offline reading or printing
Social Media Sharing: Auto-generate shareable cards optimized for Twitter, LinkedIn, etc.

Medium-Term Features

Batch Processing: Upload a reading list, get a full set of explainers
Custom Branding: Researchers and educators could white-label explainers for their institutions
Interactive Explanations: Click on concepts for deeper dives, related papers, or definitions
Multi-Language Support: Translate explainers to make research globally accessible

Long-Term Vision

Video Explainers: Generate narrated video summaries using AI voice and animations
Collaborative Learning: Users can annotate, ask questions, and discuss papers within Mitate
Citation Networks: Visualize how papers connect and influence each other
Custom Knowledge Graphs: Build personal mind maps of research domains

Technical Improvements

Smarter Chunking: Handle 100+ page papers by intelligently extracting key sections
Multiple AI Providers: Fallback between DigitalOcean, OpenAI, Anthropic for reliability
Image Optimization: CDN integration, responsive images, WebP/AVIF formats
Real-Time Updates: WebSocket-based status updates instead of polling
A/B Testing Framework: Systematically improve prompts based on user feedback

Reflection

Building Mitate taught us that the real challenge in AI products isn't the AI itself—it's the thoughtful integration of AI capabilities into human-centered experiences. We learned to work with AI limitations rather than against them, to embrace asynchronous patterns for long-running tasks, and to prioritize accessibility above technical sophistication.

Research should be accessible to everyone with the curiosity to learn. Mitate is our contribution to that vision—using AI not to replace human understanding, but to bridge the gap between expert knowledge and eager learners.

Open source at every level: We're particularly proud that Mitate embodies the principles of democratization at every layer. Open research papers (arXiv) explained by open source AI models (Llama 3.3, FIBO) hosted on accessible infrastructure (DigitalOcean) deployed on open source platforms (Appwrite). This alignment of values—openness explaining openness—makes Mitate more than a tool; it's a statement about how accessible knowledge creation should be.

FIBO: The Perfect Partner: Building Mitate taught us that FIBO/Bria AI isn't just another image generation model—it's a thoughtfully designed tool for developers who need reliability and control. The structured JSON prompt system, consistent API behavior, and production-grade quality made FIBO the ideal foundation for our visual storytelling platform. We're excited to see what else we can build with FIBO's capabilities.

The journey from idea to production-ready application proves that with open source AI like FIBO, serverless infrastructure, and clear architecture, developers worldwide can build sophisticated AI applications without massive budgets or proprietary lock-in. The democratization of AI tooling enables the democratization of human knowledge.

Here's to making research accessible, one visual explainer at a time. 🚀

Mitate: From complex papers to clear understanding.

Built With

appwrite-database
appwrite-functions
appwrite-static-sites
appwrite-storage
arxiv-api
digitalocean-gradient-ai-platform
fibo/bria-ai
llama-3.3-70b-instruct
node.js
npm
react-19
shadcn/ui
tailwind-css-4
tanstack-router
typescript
vite
zod