Inspiration

The inspiration for LiveEnhancer came from a simple observation: photo editing is often too technical. Most people have a clear vision for their photos "make it warmer," "remove the background," or "make it look like a painting" but they get lost in complex sliders and hidden menus.

I wanted to bridge the gap between human intent and digital execution. With the release of Gemini Live, I saw an opportunity to create a "Magic Mirror" for photos, an assistant you can actually talk to, who understands your creative vision as much as a professional editor would.

What it does

LiveEnhancer is an AI-powered photo editing assistant that puts the power of professional retouching into a conversational interface.

  • Voice-Guided Editing: Use real-time voice commands via Gemini Live to describe changes.
  • Magic Auto-Fix: A one-tap intelligent enhancement that analyzes and repairs lighting, color, and focus.
  • Artistic Transformations: Instantly apply complex styles, from Van Gogh’s swirling brushstrokes to gritty Cyberpunk neon.
  • Pro Tools: Includes 4x upscaling, background replacement, and a split-view comparison slider to see your progress.
  • ##How I built it I built LiveEnhancer using Google AI studio with minor manual edits. App boats a cutting-edge stack designed for speed and beauty:
  • Frontend: React with TypeScript for a robust, type-safe user interface.
  • Styling: Tailwind CSS with a premium "glassmorphism" aesthetic, featuring vibrant gradients and iOS-inspired blurs.
  • AI Engine: The core is powered by the @google/genai SDK. I utilized Gemini 3.1 Flash for high-speed image edits and Gemini Live (Native Audio) for the real-time conversational layer.
  • Animations: motion/react (Framer Motion) provides the fluid transitions and "pulsing" AI states that make the app feel alive.
  • Real-time Audio: I implemented a custom Web Audio API handler to stream PCM audio, enabling low-latency, natural dialogue with the AI.

Challenges I ran into

The biggest hurdle was multimodal synchronization. Managing a live audio stream while simultaneously providing image context to the model required precise state management to avoid lag or race conditions.

I also spent significant time on prompt engineering. Translating subjective human requests like "make it more moody" into consistent, high-quality image transformations required deep testing. Perfecting the "Van Gogh" style, for instance, required fine-tuning the balance between the original photo's structure and the AI's artistic expression.

Accomplishments that I'm proud of

I am incredibly proud of the Gemini Live integration. Speaking to your photo and seeing it transform in real-time feels like magic.

I also successfully implemented a "Magic Auto-Fix" feature that doesn't just apply a filter—it uses Gemini's vision capabilities to actually understand what's wrong with a photo (like a dark subject or a blurry background) and fixes it intelligently. Finally, achieving a "consumer-ready" look and feel that hides the complexity of the underlying tech is a major win for me.

What I learned

Building this project taught me the true potential of Multimodal AI. I learned that AI is no longer just a chatbot; it's a collaborative partner. I gained deep experience in handling raw audio data in the browser and discovered that the most powerful user interface is often the one you don't have to touch—you just have to talk to it.

What's next for LiveEnhancer - Your AI Photo Editing Assistant

The journey is just beginning. My roadmap includes:

  • Object-Specific Editing: Commands like "make just the car red" or "change my shirt color."
  • Generative Expansion: Using Gemini to "outpaint" and expand the borders of your photos.
  • Collaborative Live Rooms: Multiple creators talking to the same image in a shared creative session.
  • Mobile Integration: Bringing the voice-first editing experience to mobile for creators on the move.

How Google Cloud Services used

This application leverages Google Cloud's advanced AI ecosystem by integrating the Gemini 2.5 Flash Live API to power a real-time "Voice Studio," allowing users to interact with the app via low-latency multimodal voice commands. These verbal instructions are translated into actions through AI Function Calling, which triggers the Gemini 3.1 Flash Image model to perform complex, high-resolution photo edits, upscaling, and artistic transformations. This entire full-stack experience is hosted on Google Cloud Run, providing a seamless, scalable environment where sophisticated generative AI models directly manipulate application state to deliver a professional-grade image editing assistant.

Built With

  • google-ai-studio
Share this project:

Updates