Inspiration

Last December, my mum and I were planning to attend an event, but she was struggling to decide between two dresses she really liked. She found both designs on Pinterest and planned to give one of them to our tailor to recreate. However, she couldn’t decide which one to choose because she loved the overall design of one dress, but preferred the neckline and sleeves of the other.

I suggested that we use the Gemini app to combine the two designs. After several attempts and prompt refinements, Gemini finally generated the exact combination she had in mind. That moment sparked an idea — instead of relying on repeated prompting, why not build an app specifically for this purpose? That was how the idea for ImageFuse was born.

What it does

ImageFuse is a web application designed for people who have creative ideas but struggle to visualize them clearly. It allows users, especially fashion designers to combine specific elements from different images using natural language.

By bringing these ideas to life digitally, ImageFuse eliminates the need for manual editing tools and helps users visualize their concepts before creating them in real life.

## How I built it ImageFuse was built as a lightweight web application that integrates Gemini’s multimodal reasoning to process both images and text prompts.

The frontend handles image uploads and user input, while the backend, built using Next.js API routes, sends the images and prompts to the Gemini API. Gemini then reasons about the visual content and generates the fused result. The goal was to keep the interface simple while showcasing Gemini’s ability to understand and manipulate visual information through natural language.

Challenges I ran into

ImageFuse was built as a lightweight web application that integrates Gemini’s multimodal reasoning to process both images and text prompts.

The frontend handles image uploads and user input, while the backend, built using Next.js API routes, sends the images and prompts to the Gemini API. Gemini then reasons about the visual content and generates the fused result. The goal was to keep the interface simple while showcasing Gemini’s ability to understand and manipulate visual information through natural language.

Accomplishments that I am proud of

Successfully using Gemini to merge visual elements through natural language

Building a simple and intuitive interface for a complex AI process

Creating a practical tool with real-world creative use cases

What I learned

This project deepened my understanding of:

Multimodal AI workflows

Prompt design for image-based reasoning

Designing simple interfaces around powerful AI systems

Most importantly, I learned how AI can turn traditionally complex creative workflows into natural, conversational experiences.

What's next for Image Fuse

Future versions of ImageFuse could include refinement controls, multiple fusion passes, and creative presets, but this project focuses on proving that conversational image editing is possible today with Gemini.

Built With

Share this project:

Updates