Inspiration

During HackUTA 2025’s Spider-Verse theme, our team was inspired by how comic universes connect people through visuals and storytelling. We wanted to build something that blends AI and visual art — a lens that can instantly recognize and explain any comic scene or cover.

That idea became Comicverse Lens — an intelligent web app that scans any comic image, identifies the universe it belongs to, and even extracts the hidden text or dialogue inside it.

🧠 What It Does

Comicverse Lens lets users upload or scan any image — from a comic, poster, or photo — and the AI automatically:

Detects if it’s a comic-related image or not.

Analyzes the scene and characters and provides a summary.

Identifies the comic universe (e.g., Marvel, DC, etc.).

Extracts text inside the image (OCR) and displays it separately.

Saves all results into a personal catalog for users to view later.

Whether it’s a Spider-Man cover or a random photo, the AI explains what it “sees” — combining computer vision, text recognition, and generative reasoning.

🧩 How We Built It

Frontend: Next.js (React + TypeScript)

Styling: Tailwind CSS

AI Model: Google Gemini 2.5 Flash (Generative + Vision)

OCR Integration: Tesseract.js for extracting text from uploaded images

Deployment: Vercel (serverless Next.js deployment)

Version Control: GitHub

We connected Google’s Generative AI SDK (@google/genai) to process image URLs and local uploads, and then used Next.js APIs to handle both /describe and /describe-upload routes for online and offline images.

⚙️ How It Works (Workflow)

User uploads an image (or gives a link).

The API converts the image to base64 and sends it to Gemini Vision.

Gemini analyzes the image’s visual details and context.

OCR runs in parallel to extract embedded text from the image.

The frontend displays both:

Comic analysis (summary, universe, fun fact)

Extracted text (if any) under a “Text Analyzation” section.

🚀 What We Learned

How to integrate multimodal AI models with real-world apps.

Working with Next.js API routes for server-side Gemini requests.

Handling CORS, env files, and Vercel deployment pipelines.

Using OCR to add real utility to visual recognition apps.

💥 Challenges We Faced

Setting up the Google API key and Gemini SDK connection initially caused multiple authorization errors.

Handling image uploads from local devices instead of just links was tricky.

Vercel deployment threw multiple TypeScript linting and ESLint warnings, which we had to suppress safely.

Making the app run smoothly with both OCR and visual AI together was one of the hardest parts.

🕶️ What’s Next

Add user authentication with Auth0 so each user has their own catalog.

Implement emotion and art-style detection (e.g., “retro Marvel style” vs. “modern manga”).

Expand to video frame analysis — comic trailers or fan animations.

🌐 Try It Live

Deployed App: comicverse-lens.vercel.app

Built With

Share this project:

Updates