Chat-based AI Image Generator | Pixparkle

Inspiration

Most AI image generators still feel like command-line tools: type a prompt, hit enter, get a result, start over if it's wrong. There is no conversation, no iteration, no memory of what came before. If you want to tweak the lighting or adjust the composition, you copy-paste a revised prompt and hope the model interprets it the same way.

I wanted to build something different. A tool where you talk to the AI the way you would talk to a designer: "make the background warmer," "crop in tighter," "now turn this into a video." The kind of back-and-forth that creative work actually requires. Pixparkle was born from that frustration with single-turn, form-based generators, and from the belief that AI image creation should feel like collaboration, not a slot machine.

What it does

Pixparkle is a chat-based AI image and video generator. Instead of filling out prompt fields, you describe what you want in plain English and refine through multi-turn conversation. Every edit builds on the last without regenerating from scratch.

Key capabilities:

Six frontier models in one subscription. Switch between GPT Image 2, Nano Banana, Nano Banana 2, Nano Banana Pro, Seedream 5, and Flux Fast per turn without losing conversation context. Each model has different strengths in speed, resolution, text rendering, and photorealism.
Multi-turn conversational editing. Say "make the lighting warmer," "swap the background to a beach," or "add a logo in the top left" and the AI applies the change while preserving everything else. The chat remembers full history, including uploaded reference images, across dozens of turns.
Image-to-video without leaving the chat. Generate a static image, then ask Pixparkle to animate it with Veo 3.1 Lite. No switching platforms, no re-uploading assets.
Up to 4K output with platform presets. Export at up to 4K resolution with aspect ratio presets for Amazon listings, Shopify stores, Instagram, YouTube thumbnails, and more. Commercial use included on paid plans.
Production-grade multilingual text rendering. Logos, headlines, and labels render cleanly in Chinese, Japanese, Korean, and Latin characters, a known failure mode for most AI image generators.
Image-to-image and text-to-image in one chat. Upload a reference photo, describe changes in plain language, and the AI preserves unmentioned elements.

The workflow is simple: Describe, Generate, Refine in Chat, Export in 4K. Free to start, no credit card required.

How I built it

Frontend: Next.js 15 with App Router and React 19. Tailwind CSS v4 for styling with a custom CSS variable theme system supporting light and dark modes. Radix UI and Shadcn UI for accessible components. Motion (Framer Motion) for animations. The chat interface uses SWR for real-time message streaming and Embla Carousel for horizontal model selection.

Backend: Deployed on Cloudflare Pages via OpenNext, running as a single Cloudflare Worker. The architecture uses Cloudflare D1 (SQLite with Drizzle ORM) for user data, R2 for media storage, KV for key-value caching, and Durable Objects for managing per-message prediction lifecycle state with TTL-based cleanup.

AI pipeline: Image and video generation runs through the Replicate API, with a Gemini 3-powered evaluator layer that translates conversational intent into model-specific prompts. The evaluator handles multi-turn context by maintaining full conversation memory, understanding both text instructions and uploaded reference images. Each model has a dedicated adapter that maps the evaluator's output to model-specific parameter schemas, including resolution tiers, quality presets, negative prompts, aspect ratio constraints, and safety filters.

Authentication and payments: NextAuth v5 with Google OAuth plus Google One Tap. Stripe handles subscription management with four tiers: Free (10 credits, no card), Starter ($9.90/month, 500 credits), Pro ($29.90/month, 1,600 credits), and Premium ($99.90/month, 6,500 credits). Credit consumption varies by model and settings, from 1 credit for a quick Flux Fast draft to 50 credits for a Veo 3.1 Lite video.

Internationalization: next-intl with 16 languages, using [locale] dynamic route segments and RTL layout support.

Challenges I ran into

The hardest problem was making multi-turn editing actually work across different models. Each model has its own prompt format, parameter schema, resolution limits, and behavior around preserving unmentioned content. Building a unified evaluator layer that translates conversational intent correctly for six different models, while maintaining quality across turns, took months of iteration. The evaluator has to decide not just what the user wants, but which model best fits that specific request and how to express the instruction in the model's native dialect.

Multilingual CJK text rendering was another major challenge. Most AI image models produce garbled characters for Chinese, Japanese, and Korean text. Getting clean, production-grade text output required building a multi-pass pipeline that detects non-Latin characters in user requests, switches to models with stronger text capabilities (GPT Image 2 in particular), and applies post-generation text rectification where needed.

Running a multi-model AI pipeline on Cloudflare Workers presented significant architectural constraints. Workers have CPU time limits, memory caps, and no persistent filesystem. We built a Durable Objects layer to manage long-running prediction polling with retry logic, and an async queue system for video generation jobs that need minutes rather than seconds. Debugging prediction state across DO instances was particularly painful since local development does not fully replicate the production topology.

The 4K export pipeline had to handle variable model capabilities. Not all models support 4K natively, so we built a resolution escalation path that upscales via higher-tier models when the current model cannot reach the target resolution, all transparent to the user.

Accomplishments that I'm proud of

Building a true conversational editing experience for AI image generation. Most tools in this space are glorified form fields with a chat bubble skin. Pixparkle actually maintains semantic context across turns, understands when the user is refining versus starting fresh, and preserves unmentioned elements between edits. Watching someone go from "generate a coffee shop interior" to "make it cozier" to "add a cat on the counter" to "now turn it into a short video" in a single continuous thread is exactly the experience I set out to build.

Shipping with six frontier models accessible from a single subscription, with per-turn model switching, is something no other tool in the space offers. Users get access to the best models from OpenAI, Google, ByteDance, and Black Forest Labs without managing multiple vendor accounts or subscriptions.

Getting CJK text to render cleanly across all models was a technical challenge that paid off in a real differentiator. This single capability opened up the entire Asian e-commerce market, where sellers need product images with accurate Chinese, Japanese, and Korean labels and cannot use tools that produce mojibake.

Launching a working free tier with no credit card. Ten free credits is enough for someone to go through the full workflow, generate a real result, and understand the value before paying. This was a deliberate product choice, not a growth hack.

What I learned

Building a multi-model AI product means accepting that you are not in control of your core dependency. Models change, deprecate, introduce regressions, and sometimes return completely different output for the same prompt. The evaluator layer became not just a translation system but a stability layer: it absorbs model-level variance so users get consistent results even as underlying APIs shift.

Cloudflare Workers are powerful but opinionated. The constraints around execution time, memory, and filesystem access forced simpler architecture choices that turned out to be correct anyway. Durable Objects as a state management layer for async prediction pipelines is a pattern I would use again, but it requires careful TTL design to avoid runaway storage costs.

Credit-based pricing for AI products is deceptively hard to get right. You need to balance per-generation cost (which varies 20x across models), perceived value, and user psychology around "spending" credits. Free users need enough credits to experience the full product; paid users need enough volume that they do not feel nickel-and-dimed per generation. We iterated on the credit table at least a dozen times during development.

The chat interface itself is a UX challenge that most AI products underestimate. Streaming partial responses, handling mid-generation cancellations, showing progress for long video jobs, and gracefully degrading when a model returns an error all require careful state machine design. Users do not read status codes; they just need to know if the thing is working.

What's next for Chat-based AI Image Generator | Pixparkle

We are focused on a few areas:

More model integrations. The model landscape moves fast and we want Pixparkle to be the place where users get first access to new capabilities without switching tools. Video generation quality in particular is improving rapidly and we plan to support longer clips and higher resolutions as the underlying models advance.
Team and collaboration features. Shared workspaces, brand asset libraries, and style guides so that teams can maintain visual consistency across members. A marketing team should be able to define a brand palette and have every generation from every team member respect it automatically.
Batch generation and templates. Generate product photo variants across multiple aspect ratios, backgrounds, and languages in one request. Define a template once and apply it to an entire product catalog.
API access. A developer API so that e-commerce platforms, CMS tools, and automation workflows can integrate Pixparkle generation directly into their pipelines.
Mobile app. The chat-based UX is a natural fit for mobile. On-device generation for quick drafts, with cloud models for final high-res output.

Built With

Updates

horus he started this project — May 26, 2026 03:33 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.