StickerSmith: A Pixel Art Generator Fusing Faces and Logos

Overview

StickerSmith combines two distinct visual inputs to generate a unified pixel-art avatar:

  • Face: A selfie or a character portrait
  • Symbol: A company logo, a team crest, or an abstract icon

By leveraging the multimodal capabilities of Gemini 3 Pro Image (Nano Banana Pro), the tool fuses these concepts into a single entity. The output is a high-resolution sprite sheet arranged in a perfect grid, featuring 16 unique poses and expressions of a newly generated pixel avatar.


Technical Implementation

StickerSmith was built using a modern, lightweight frontend stack powered by Google’s latest generative models.

  • Frontend: React (TypeScript) with Vite for a fast development cycle
  • Styling: Tailwind CSS to achieve a clean, dark-mode "developer-aesthetic" UI
  • AI Engine: Integrated via the @google/genai SDK to communicate with the Gemini API
  • Model: gemini-3-pro-image-preview (Nano Banana Pro)

While Flash models are faster, the Pro model was essential for capturing the nuance of fusion—blending the color palette and shape language of a logo with a face without losing the identity of either.


Challenges Encountered

Prompt Adherence

Early iterations often produced:

  • A single large image instead of a grid
  • Randomly scattered sprites

We refined prompt engineering to strictly enforce the "4 rows × 4 columns" constraint.

API Key Management

To allow users to bring their own paid keys for the Pro model, we implemented a secure window.aistudio key selection flow. This required careful UX design to ensure a seamless and secure experience.


Key Takeaways

Gemini 3 Pro’s Instruction Following

The difference between Flash and Pro models in adhering to complex spatial instructions (e.g., generating a 4×4 grid) is significant. The Pro model demonstrates a far superior understanding of layout.

Prompting for Sprite Sheets

Using the term "Sprite Sheet" acts as a trigger for the model, ensuring:

  • Consistent character design
  • Structured multi-variation output

The Joy of Fusion

AI excels at combinatorial creativity—finding a visual middle ground between two distinct concepts that would be difficult for humans to visualize instantly.

Built With

Share this project:

Updates