Base black-white image from user to LLM to text to image
base image + user input into control net
base image + user input into control net

Luminate Project Documentation

Inspiration

Our journey began with a fascination for the transformative potential of artificial intelligence in the realm of image generation. The idea was to transcend traditional boundaries and introduce an innovative approach that could revolutionize how we interact with and manipulate digital imagery. The project idea's feasibility (and only that) was discussed with a Stanford professor a day prior, and inspired by the existing capabilities of stable diffusion models, we envisioned a system that not only generates images but offers unprecedented control over the lighting conditions within these images, making the process more dynamic and adaptable.

What it does

Luminate is a cutting-edge platform that integrates advanced image and text generation capabilities. At its core, it leverages a modified stable diffusion model, enhanced with our innovative training techniques, to allow users to alter the direction of lighting sources within an image. This capability enables the adjustment of lighting exposure and shadows, thereby dramatically transforming the visual impact of the generated imagery. In addition, Luminate incorporates a text generation component designed to translate user inputs into precise coordinates, facilitating intuitive and efficient interactions with the system. The product can be used in the future and have powerful capabilities in whether it be photoshop software, model photography, animation, etc.

How we built it

The development of Luminate involved the integration of several key technologies, including a controlnet diffusion model and CLIP for image generation, coupled with a custom text generation model trained on generated data. This multi-faceted approach allowed us to create a robust platform capable of understanding and executing complex user inputs. Our infrastructure leverages the power of RTX 3090 GPUs and Stanford's computing clusters, though we faced significant challenges in scaling our computational resources to meet the demands of our large dataset.

Challenges we ran into

One of the primary hurdles we encountered was the sheer size of our dataset, which posed substantial challenges in terms of computational resource requirements. Despite having access to high-performance GPUs and university computing clusters, we struggled to achieve the desired speed and efficiency in our model training process given the time constraint. This was inevitable as stable diffusion even fine-tuning requires a large amount of data. This limitation forced us to explore alternative solutions and optimizations to progress with our project development.

Accomplishments that we're proud of

Despite the challenges, we are immensely proud of what we have achieved with Luminate. We successfully established a comprehensive product pipeline that includes a user-friendly interface, a LLM, and a nearly finalized model capable of generating black and white images. Furthermore, we devised a novel approach involving the use of a different pretrained model to transform these our output from LLM to a text-to-image Intel to generate black and white images based on specified coordinate directions, then use control net to generate images based on this restriction, bringing us close to what our vision is for the final product.

What we learned

This project has been a tremendous learning opportunity for our team. We gained valuable insights into the complexities of training AI models on large datasets and the importance of computational resources in the field of deep learning. We also gained a full understanding of the entire control-net architecture, as we were doing very base-level development to the point of matrix calculations during training. We learned to navigate the challenges of integrating multiple AI technologies, having used Monster API, Fetch.ai, together.ai, Intel, etc. to create a cohesive and functional system. You can see our model fine tuned by with monster API as follows on hugging face: leelandzhang/moster-api-LLM-light-change (model weights).

What's next for Luminate

Looking ahead, first we are focused on overcoming the remaining challenges to fully realize our vision for Luminate, which includes the conclusion of training for our primary light-modification generation model. Physic priors can be added, so we can add physical boundary constraints, this can greatly increase the spacial precision of generated images, and coupled with our lighting, we are capable of generating more stable and accurate images than others in the market and existing tools. We can also improve the user experience with a more developed UI/UX and product usability. As we continue to innovate and expand our platform's capabilities, we are excited about the potential of Luminate to redefine the landscape of AI-powered image generation.

Built With

clip
control-net
intel
mixtral
pytorch
relightable-nerf
text-to-pictures
together.ai

Updates

Leeland Zhang started this project — Feb 18, 2024 11:49 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.