Examining Diffusion Models for Image Generation

Romeo the dog in the Impressionist style

Inspiration

Our inspiration came from the rapid advancements in generative AI, especially diffusion models, which have reshaped how we think about image creation and manipulation. We wanted to push the boundaries further: could we teach these models new, highly specific objects and styles? Could we train diffusion models to not just replicate but innovate by combining these new elements?

What it does

This project enables a diffusion model to go beyond its standard training by learning unique, custom objects and styles. Through a series of experiments, the model can first understand and recreate a new object through fine-tuning. Then, using a hard-prompt approach, it learns an entirely new style, combining both learned components to produce custom images that didn’t exist in its original training set.

How we built it

We trained Stable Diffusion to learn Romeo the dog from just 3 images, allowing us to personalize the model with unique inputs. Our approach fine-tunes the entire model, which can improve results, though it also increases model size.

Next, we used images of Impressionist artworks by Claude Monet, optimizing a prompt with the PEZ algorithm and CLIP encoders. Finally, we combined Romeo the dog with these optimized prompts to generate new images of Romeo in the Impressionist style!

Challenges we ran into

We initially struggled with brainstorming ideas, sourcing images, and correctly uploading them to the model. Technical issues with Google Colab further added to the complexity, as we had to restart our sessions multiple times, costing us valuable time and energy. But, one of our biggest challenges was selecting images for the PEZ dispenser that would yield accurate, desired results. Often, the output didn’t match the prompt or aligned poorly with our intended direction.

Accomplishments that we're proud of

We’re proud of creating a model that can generate images with both learned object and style attributes. Each generated image reflects the fine-tuned object’s characteristics combined with the aesthetic qualities of the hard-prompt style—a complex task that speaks to the adaptability of diffusion models.

What we learned

Through this project, we deepened our understanding of diffusion models’ versatility and how different training techniques—like fine-tuning for specific objects and prompt-based style adaptation—can enhance their creative potential.

What's next for Diffusion Models for Image Generation and Classification

Next, we aim to refine the model’s ability to generalize across more diverse objects and styles. We’re also interested in exploring applications in fields like custom art, and design, where personalized image generation can add immense value.

Built With

dreambooth
google-colab
hugging-face
matplotlib
numpy
pez-dispenser
python
pytorch

Updates

Shiqi (Cathy) Wu started this project — Oct 27, 2024 10:45 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.