Inspiration

Legacy tools such as Adobe Photoshop, GIMP and Adobe Lightroom, despite being the industry standard when it comes to editing photos, tend to be cluttered and geared towards a technical user base, making them highly unintuitive for the average user.

What it does

PixelPal provides users with a chat interface and graphical workflow tool to intuitively make edits to their photos through an iterative process. First, the user provides a natural language input (e.g give the picture a 90s film retro vibe) and some toggle options, which are parsed into up to five discrete instructions by Gemini API. These instructions are then presented to the user in a step-by-step workflow, allowing the user to make any comments or changes during the photo editing process through more natural language inputs.

How we built it

The entire app structure was made using the Next.js full-stack framework. The frontend was made using React and other npm libraries such as Framer Motion and anime.js. The backend was coded in Javascript along with the Google Gemini API through npm libraries like @google/generative-ai and @google/genai. We used the Gemini 2.5 Flash model for image editing/generation, and Gemini 2.5 Flash Lite for parsing natural language input into distinct iterative instructions for the workflow. Version control was handled using git and a GitHub repository.

Challenges we ran into

The most complicated component of this project to implement was the workflow graphical user interface (GUI). It is fairly novel and complex, so integrating the backend logic into it was quite challenging; we spent the most time debugging this part of the development process.

Accomplishments that we're proud of

The GUI is implemented successfully with smooth rendering and animations, which is impressive considering the logical complexity of the workflow. For everyone in our group this was also the first time we worked with the Google Gemini API, and we were able to pick it up and effectively apply it to this project.

What we learned

We gained some great first-hand insight into the benefits of integrating AI models into Software-as-a-Service (SaaS) applications, namely the added functionality and potential for creative workflow implementations, such as what we did with PixelPal.

What's next for PixelPal

Language-driven creativity is a new frontier for innovation in the era of AI-integrated workflows. Our current workflow design, despite currently being applied exclusively to editing images, can be expanded to a lot of other creative processes, such as video editing and 3D modeling. Additionally, we can also explore integrations with other tools, particularly social media applications, to complement PixelPal's purpose as a creative media generator/editor. With these expansions, PixelPal has immense potential to democratize complicated media editing and redefine the creative process for independent creators.

Built With

Share this project:

Updates