Inspiration

Small businesses and solo marketers face a painful contradiction: social media is their most cost-effective marketing channel, yet producing professional-quality, platform-optimized content consistently is expensive and time-consuming. Coming up with ideas, creating images, and writing captions eats up 30–60 minutes per post. Outsourcing to an agency costs $1,000–5,000/month. Many businesses simply give up and stop posting.

We built Adpicto because we saw Gemini 3's native image generation as a breakthrough opportunity. Unlike previous workflows that required stitching together separate text and image AI services, Gemini 3 Pro Image can generate marketing visuals directly from a multimodal prompt — including brand assets, color palettes, and detailed creative direction — in a single API call. This meant we could build a tool where a user describes their product once, and the system handles everything: market positioning, creative direction, image generation, platform-specific copy, and even scheduled publishing.

The personal motivation was clear: we wanted to democratize the work of a full creative agency into a single, intelligent tool that anyone can use.

How Adpicto differs from existing tools:

Canva is a great design tool — but you still decide what to create, write the copy separately, and publish manually. Buffer and Hootsuite excel at scheduling — but you still need to produce all the content yourself. In other words, existing tools only cover part of the workflow.

Adpicto is different. The moment you enter a URL, it understands your service, builds a marketing strategy, generates images, writes copy, and publishes to SNS — all in one seamless flow. The user's job is three steps: paste a URL, pick a style, and hit publish.

What it does

Enter your service URL and Adpicto analyzes your website, generates pro-level marketing images and platform-optimized copy, and publishes directly to Instagram and LinkedIn. No design skills needed. No copywriting. No manual posting — AI handles everything.

Core Features:

  • One-Click Project Setup: Enter your service URL — Gemini 3 Flash with URL Context automatically extracts your product name, description, target audience, key features, brand colors, and competitive differentiators from your website
  • AI Marketing Strategy: When creating a post, generates 3 distinct marketing appeal axes (awareness, interest, acquisition) from the extracted features, each with a recommended visual style
  • Image Generation: Creates professional marketing images in 13 distinct visual styles (infographic, 4-panel comic, isometric 3D, app store mockup, hero banner, quote card, and more) using Gemini 3 Pro Image
  • Image Editing: Mask-free image editing powered by Imagen 3 Edit API — describe changes in natural language
  • Platform-Optimized Copy: Generates SNS text with proper character limits, hashtag counts, and tone (casual/professional/playful/informative) for each platform
  • Brand Consistency: Upload brand assets (logos, characters, icons) as reference images — the AI integrates them with role-based strategies (logo → corner placement, character → hero positioning)
  • SNS Integration & Publishing: Direct OAuth-connected publishing to Instagram and LinkedIn with immediate and scheduled posting support
  • Analytics Dashboard: Track post views, referrers, devices, and geographic breakdown

By the Numbers:

  • 4 supported platforms for content generation (Instagram, Twitter/X, Facebook, LinkedIn) — SNS publishing currently live for Instagram & LinkedIn
  • 13 image styles with platform-specific aspect ratios
  • 9 aspect ratio options (1:1, 4:3, 3:4, 16:9, 9:16, 4:5, 2:3, 3:2, 21:9)
  • 2 languages (English, Japanese) with full i18n
  • 4 tone presets for text generation
  • 2,089 automated tests

How we built it

Architecture Overview:

User → Next.js 15 (React 19, Server Actions)
         ↓
       DDD Service Layer (23 domains, Result<T,E> pattern)
         ↓                          ↓
   Supabase (DB/Auth/Storage)    pgmq Job Queue → Vercel Cron Workers
                                    ↓
                              Gemini 3 Flash  (text / prompt gen)
                              Gemini 3 Pro Image  (image gen)
                              Imagen 3  (image editing)

Gemini 3 Integration (Core of the Application):

Adpicto uses three distinct Gemini/Google AI models, each for a specific purpose:

Model Purpose SDK / API File
Gemini 3 Flash (gemini-3-flash-preview) Appeal axis generation, image prompt engineering, post text generation @ai-sdk/google-vertex (Vercel AI SDK) ai.gateway.ts, text.gateway.ts
Gemini 3 Flash (gemini-3-flash-preview) URL Context — website analysis & project info extraction @google/genai SDK (direct) ai.gateway.ts
Gemini 3 Pro Image (gemini-3-pro-image-preview) Marketing image generation with brand asset references Vertex AI REST API (generateContent) image.gateway.ts
Imagen 3 (imagen-3.0-capability-001) Mask-free image editing via natural language Vertex AI REST API (:predict) image.gateway.ts

1. URL Context (Website Analysis) — ai.gateway.ts:1785-1822

When a user enters their service URL, Gemini 3 Flash with the URL Context tool crawls the website and extracts structured project information:

const response = await genai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: [this.buildExtractProjectInfoPrompt(url, language)],
  config: { tools: [{ urlContext: {} }] },
});

This uses the @google/genai SDK (not @ai-sdk/google-vertex) because the Vercel AI SDK does not yet support URL Context (vercel/ai#7753). Extracted fields include: service name, description, target audience, pain points, competitor differentiators, brand colors, and key features/appeal points.

2. Marketing Strategy Generation — ai.gateway.ts:162-370

During post creation, Gemini 3 Flash generates 3 distinct marketing appeal axes from the project's extracted features, with carefully engineered prompts that enforce fact-based, number-rich, visualizable marketing angles. The prompt includes few-shot examples distinguishing good vs. bad outputs:

Good: "Auto-report feature reduces weekly report creation from 3 hours to 10 minutes"
Bad: "Improve efficiency and make better use of time"

3. Image Prompt Engineering — ai.gateway.ts:748-1093

A sophisticated prompt generation pipeline that creates bilingual (Japanese + English) structured prompts with:

  • Source-detection logic (research-centric vs. creative content → different layouts)
  • 13 image-type-specific structure guides (e.g., infographic → data hierarchy, comic → 4-panel story)
  • Quality keyword injection per image type (e.g., app_store → "realistic device mockup, depth of field bokeh")
  • Pixel-accurate dimension specifications for each aspect ratio
  • Negative prompt sections to avoid common artifacts

4. Image Generation — image.gateway.ts:327-523

Gemini 3 Pro Image generates marketing images via the Vertex AI REST API with:

  • Multimodal prompts: Text instructions + base64-encoded brand assets as reference images
  • Role-based asset integration: Each reference image is tagged with a role (logo → 10-15% corner placement with drop shadow; character → 20-40% hero positioning; style → extract visual treatment)
  • Brand color distribution: Enforced 60-30-10 rule (primary-secondary-accent)
  • Generation presets: creative (topP=0.95), brand_consistent (topP=0.8), precise (topP=0.7) — all at temperature=1.0 per Google recommendation
  • 2K resolution output with configurable aspect ratios

5. Image Editing — image.gateway.ts:806-920

Imagen 3 Edit API enables mask-free editing where users describe changes in natural language:

editMode: "EDIT_MODE_DEFAULT", // Mask-free editing
referenceImages: [{ referenceType: 2, referenceImage: { bytesBase64Encoded: base64 } }]

6. Post Text Generation — text.gateway.ts

Gemini 3 Flash generates platform-optimized copy with per-platform constraints:

  • Instagram: 2200 chars, 5 hashtags, story-driven
  • Twitter/X: 280 chars, 2 hashtags, concise
  • Facebook: 80 chars optimal, 2 hashtags
  • LinkedIn: 3000 chars, 2 hashtags, professional tone

Authentication:

  • Production: GCP Service Account via GOOGLE_SERVICE_ACCOUNT_KEY environment variable
  • Local development: Application Default Credentials (ADC)
  • URL Context: Google AI API key via GOOGLE_GENAI_API_KEY

Other Technology Stack:

  • Frontend: React 19, Next.js 15 (App Router), Radix UI, Tailwind CSS v4
  • Backend: Bun runtime, Server Actions (no REST API routes)
  • Database: Supabase (PostgreSQL) with Row-Level Security
  • Queue: pgmq (PostgreSQL Message Queue) with Vercel Cron workers
  • Billing: Stripe via Supabase Sync Engine (no custom webhooks)
  • Image Processing: Sharp (watermarking, resizing)
  • Testing: Vitest (2,089 tests), Playwright (E2E), fast-check (property-based)
  • Deployment: Vercel (serverless functions, cron jobs)
  • i18n: next-intl (English + Japanese)

Challenges we ran into

1. Serverless Timeout Management

Vercel functions have a 300-second hard limit. Image generation via Vertex AI can take 30-90 seconds per request, and prompt generation adds another 30-120 seconds. We implemented a multi-layered timeout architecture (image.gateway.ts:181-284):

  • Per-request timeout: 90s via AbortSignal.timeout()
  • In-process retry budget: 180s total to leave buffer before Vercel cutoff
  • Budget checking before each retry: remaining < waitSeconds * 1000 + perRequestTimeoutMs * 0.5
  • Queue-level retry with exponential backoff (30s → 60s → 120s) as final safety net
  • Heartbeat-based visibility extension (every 30s) to prevent duplicate processing

2. Prompt Engineering for Consistent Brand Integration

Getting Gemini 3 Pro Image to consistently integrate brand assets (logos, characters) into generated images required extensive prompt engineering. We developed a role-based reference image system (image.gateway.ts:386-431) where each uploaded asset is tagged with its role and the AI receives specific integration instructions:

  • Logos/Icons: "Extract colors, position in corner at 10-15% scale, add drop shadow"
  • Characters/Mascots: "Feature as hero at 20-40% frame, match lighting, natural integration"
  • Style References: "Extract visual style, match texture and color treatment"

3. URL Context SDK Gap

The Vercel AI SDK (@ai-sdk/google-vertex) does not yet support Gemini's URL Context tool. We worked around this by using the @google/genai SDK directly for the URL context feature while keeping the Vercel AI SDK for all other text generation. This dual-SDK approach required separate authentication flows (API key vs. service account).

4. Source-Aware Prompt Generation

Research-backed content (academic citations, statistics) needs fundamentally different visual layouts than creative marketing content. We built a source detection system (ai.gateway.ts:716-746) that analyzes content for patterns like "出典:", "Research Finding", journal names, and university references, then switches between hierarchical top-to-bottom layouts (for research) and radial mind-map layouts (for creative content).

5. Cross-Platform Text Optimization

Each social media platform has wildly different constraints (Instagram: 2200 chars with 5 hashtags; Twitter: 280 chars with 2 hashtags; LinkedIn: 3000 chars, professional tone). We built platform-specific prompt configurations (text.gateway.ts:67-95) with per-platform style instructions, character limits, and hashtag counts, ensuring the AI generates genuinely different content for each platform rather than truncated versions of the same text.

Accomplishments that we're proud of

  • End-to-end AI pipeline: From URL input to published SNS post, the entire flow is AI-driven — website analysis (URL Context), marketing strategy (Gemini 3 Flash), image generation (Gemini 3 Pro Image), image editing (Imagen 3), and copy generation (Gemini 3 Flash) — all powered by Gemini 3
  • Production-grade reliability: 2,089 automated tests with domain layer coverage above 90%, comprehensive error handling with the Result pattern, and multi-layer retry mechanisms
  • DDD architecture at scale: 23 domain entities, 36 port interfaces, 40+ adapters — clean separation of concerns that allows swapping AI providers without touching business logic
  • Bilingual prompt engineering: All prompts generate content in both Japanese and English, with language-specific quality keywords and cultural context adjustments
  • 13 distinct image styles: Each style has its own structure guide, quality keywords, and layout instructions — not just different prompts, but fundamentally different visual compositions
  • Zero-downtime background processing: pgmq-based queue with heartbeat visibility extension, dead-letter handling, and stale job cleanup ensures no generation request is lost

What we learned

  • Gemini 3 Pro Image temperature must be 1.0: Lower temperatures cause generation loops. Google explicitly recommends 1.0, and we control output diversity through topP instead (0.7–0.95)
  • Global endpoints avoid rate limiting: Using aiplatform.googleapis.com/v1/projects/{id}/locations/global/ instead of regional endpoints prevents IP-based rate limiting on shared hosting like Vercel
  • URL Context requires a separate SDK: The Vercel AI SDK doesn't support Gemini's URL Context tool yet, necessitating the @google/genai SDK for this specific feature
  • Multimodal prompts with reference images are powerful: Passing brand assets as inline base64 images alongside detailed text instructions produces remarkably brand-consistent outputs
  • Structured prompt engineering pays off: The bilingual prompt structure with technical specs, color hex codes, typography instructions, and negative prompts dramatically improved output quality compared to simple text descriptions
  • Queue-based architecture is essential for AI generation: Direct API calls from user requests are unreliable due to variable latency (10-90s). Background processing with progress tracking provides a much better UX

What's next

  • Video Generation (Veo 3.1): We have already built the infrastructure for AI video generation using Veo 3.1 — including prompt templates for 4 video types (product showcase, lifestyle, social short, explainer), a queue-based worker pipeline, and marketing-framework-aligned prompts. The feature is currently disabled as we continue to evaluate generation quality and cost-efficiency. We plan to enable it incrementally as the model matures
  • A/B Testing: Generate multiple content variations and track which performs best per platform using SNS engagement insights
  • Multi-Image Carousels: Support Instagram carousel posts with thematically connected image sequences
  • Brand Voice Learning: Fine-tune text generation based on a brand's historical posts and engagement data
  • Team Collaboration: Multi-user workspaces with approval workflows for enterprise marketing teams
  • Expanding SNS Publishing: Facebook and Twitter/X publishing integration, plus TikTok, Pinterest, YouTube Shorts
  • AI-Powered Analytics: Use Gemini to analyze post performance and recommend content optimization strategies

Built With

  • bun
  • domain-driven-design
  • gemini
  • gemini-3-flash
  • gemini-3-pro-image
  • google-ai
  • imagen-3
  • next.js
  • pgmq
  • playwright
  • postgresql
  • radix-ui
  • react
  • sharp
  • storybook
  • stripe
  • supabase
  • tailwind-css
  • typescript
  • url-context-api
  • vercel
  • vercel-ai-sdk
  • vertex-ai
  • vitest
  • zod
Share this project:

Updates