Inspiration
Last December, my mum and I were planning to attend an event, but she was struggling to decide between two dresses she really liked. She found both designs on Pinterest and planned to give one of them to our tailor to recreate. However, she couldn’t decide which one to choose because she loved the overall design of one dress, but preferred the neckline and sleeves of the other.
I suggested that we use the Gemini app to combine the two designs. After several attempts and prompt refinements, Gemini finally generated the exact combination she had in mind. That moment sparked an idea — instead of relying on repeated prompting, why not build an app specifically for this purpose? That was how the idea for ImageFuse was born.
What it does
ImageFuse is a web application designed for people who have creative ideas but struggle to visualize them clearly. It allows users, especially fashion designers to combine specific elements from different images using natural language.
By bringing these ideas to life digitally, ImageFuse eliminates the need for manual editing tools and helps users visualize their concepts before creating them in real life.
## How I built it ImageFuse was built as a lightweight web application that integrates Gemini’s multimodal reasoning to process both images and text prompts.
The frontend handles image uploads and user input, while the backend, built using Next.js API routes, sends the images and prompts to the Gemini API. Gemini then reasons about the visual content and generates the fused result. The goal was to keep the interface simple while showcasing Gemini’s ability to understand and manipulate visual information through natural language.
Challenges I ran into
ImageFuse was built as a lightweight web application that integrates Gemini’s multimodal reasoning to process both images and text prompts.
The frontend handles image uploads and user input, while the backend, built using Next.js API routes, sends the images and prompts to the Gemini API. Gemini then reasons about the visual content and generates the fused result. The goal was to keep the interface simple while showcasing Gemini’s ability to understand and manipulate visual information through natural language.
Accomplishments that I am proud of
Successfully using Gemini to merge visual elements through natural language
Building a simple and intuitive interface for a complex AI process
Creating a practical tool with real-world creative use cases
What I learned
This project deepened my understanding of:
Multimodal AI workflows
Prompt design for image-based reasoning
Designing simple interfaces around powerful AI systems
Most importantly, I learned how AI can turn traditionally complex creative workflows into natural, conversational experiences.
What's next for Image Fuse
Future versions of ImageFuse could include refinement controls, multiple fusion passes, and creative presets, but this project focuses on proving that conversational image editing is possible today with Gemini.
Built With
- canvasapi
- geminiai
- javascript
- nextjs
- typescript
Log in or sign up for Devpost to join the conversation.