Variantlab

About the Project

Variantlab is a cutting-edge, real-time React and TSX component development environment that marries the power of AI with an intuitive visual canvas. It allows developers to create, edit, and manage UI components instantly, offering dynamic previews and AI-driven code generation. With the recent integration of the Gemini Live API, Variantlab now offers hands-free voice control, transforming how users interact with their development workflow.

Inspiration

The primary inspiration behind Variantlab was to bridge the gap between ideation and implementation in UI development. We wanted to create a tool that not only accelerated the prototyping process but also made it more natural and conversational. Drawing from the potential of large language models, we envisioned a world where developers could simply describe a component or even show an image, and the code would materialize, ready for immediate iteration. The introduction of voice control further pushes this boundary, aiming for a truly seamless, hands-free creative flow.

What it does

Variantlab offers a comprehensive set of features designed to enhance component development:

  • Real-time Code Editor & Live Preview: Write React/TSX code in an integrated editor and see your components render instantly on the canvas.
  • AI-Powered Code Generation: Leverage the Google Gemini API to generate or modify components using natural language prompts, supporting both text and image inputs.
  • Canvas-based Component Management: Organize, move, and visualize all your components as interactive, draggable nodes on a flexible canvas.
  • "Vary" for Rapid Iteration: Easily create variations (forks) of any component's state, allowing for quick experimentation with different design and code alternatives without losing previous work.
  • Hands-Free Voice Control (New!): Control the entire application using natural voice commands via the Gemini Live API. This includes:
    • Creating new components.
    • Deleting components by their title.
    • Creating variations of existing components.
    • Opening chat or code panels for specific components.
    • Sending code modification prompts directly to the active chat.
    • Receiving spoken confirmations from the AI assistant.
  • Virtual File System (VFS): Components are managed within an in-browser VFS, providing a familiar file structure for editing.
  • On-the-fly Bundling: esbuild-wasm compiles and bundles your TSX code directly in the browser for instant feedback.
  • Theming: Toggle between light and dark modes for a comfortable coding environment.

How we built it

Variantlab is built with a modern web stack and leverages powerful AI services:

  • Frontend Framework: React and TypeScript for a robust and type-safe user interface.
  • Styling: Tailwind CSS for utility-first styling, ensuring a clean and responsive design.
  • AI Integration: The core AI capabilities are powered by the @google/genai SDK, specifically:
    • ai.models.generateContent for text-to-code and image-to-code generation.
    • ai.live.connect for real-time voice interaction, including audio input/output and function calling.
  • In-browser Bundling: esbuild-wasm is utilized for client-side compilation of TSX files, enabling real-time code execution and previews within the browser sandbox. A custom esbuild plugin handles VFS resolution.
  • 3D Graphics: The initial "Welcome Component" demonstrates 3D rendering using Three.js directly, without additional React wrappers, showcasing flexibility.
  • Web Audio API: Essential for handling real-time audio input from the microphone and playing back AI-generated speech responses in the voice control feature.
  • Canvas Interaction: Custom React hooks and state management handle node dragging, panning, zooming, and panel resizing.

Challenges we ran into

Developing Variantlab presented several exciting challenges:

  • Real-time In-Browser Compilation: Integrating esbuild-wasm to work with a dynamic virtual file system and resolving module imports correctly within the browser environment was complex.
  • Gemini Live API Integration: Managing the full lifecycle of a real-time audio session (connecting, streaming, receiving messages, handling disconnections) and ensuring smooth, low-latency audio playback was a significant undertaking.
  • Robust Function Calling: Defining clear FunctionDeclaration objects and implementing the logic to parse Gemini's FunctionCall responses, execute corresponding application actions, and send tool responses back to the model required careful design.
  • Synchronized State Management: Keeping the UI, VFS, chat history, and AI models in sync across various user interactions (typing, AI generation, undo/redo, voice commands) was crucial.
  • Error Handling in Live Previews: Implementing a resilient ErrorBoundary and handling esbuild compilation errors gracefully to prevent the entire application from crashing.
  • Natural Language to UI Actions: Translating ambiguous voice commands into precise application actions required thoughtful prompt engineering and function naming for the AI assistant.

Accomplishments that we're proud of

We are particularly proud of:

  • Seamless Voice Control: The ability to entirely control the application's core functionalities (component creation, deletion, modification, panel management) using natural voice commands, providing an unparalleled hands-free development experience.
  • Iterative AI-Powered Development: The "Vary" feature combined with AI code generation allows for extremely fast iteration and exploration of design ideas.
  • Interactive Canvas: A highly responsive and intuitive canvas where developers can visually arrange and manage their components.
  • Robust Client-Side Tooling: Successfully integrating esbuild-wasm and Three.js to run complex development tools directly in the browser.
  • Clean and Modern UI/UX: A thoughtfully designed interface that prioritizes developer experience and aesthetics.

What we learned

Through building Variantlab, we gained deep insights into:

  • The Power of Multimodal AI: How integrating different AI capabilities (text generation, image understanding, real-time voice) can create truly novel and productive user experiences.
  • Web Audio API and Real-time Streaming: The complexities and best practices for working with browser audio streams for low-latency, real-time communication.
  • Effective Function Calling with LLMs: Strategies for defining clear tool interfaces and handling the back-and-forth between an LLM and an application to achieve complex control flows.
  • Browser-based Development Environments: The opportunities and constraints of building powerful development tools that run entirely in the browser.
  • User Intent vs. Explicit Command: Designing AI interactions that can interpret user intent while also allowing for precise, explicit commands when needed.

What's next for Variantlab

We envision several exciting directions for Variantlab:

  • Enhanced Conversational Context: Improving the AI's ability to maintain context across longer conversations, enabling more complex multi-step voice commands.
  • Visual-to-Code Interaction: Allowing users to directly manipulate components on the canvas (e.g., resizing, repositioning) with the AI instantly updating the corresponding code.
  • Component Library Integration: Tools to import and export components to common libraries (e.g., Storybook, Material UI, Shadcn UI).
  • Code Refactoring & Optimization: Voice commands to refactor code, optimize performance, or ensure accessibility compliance.
  • Collaborative Features: Enabling real-time, multi-user collaboration on the canvas and in code editing.
  • More Advanced 3D Capabilities: Deeper integration of 3D modeling and animation features, potentially even AI-generated 3D assets.
  • User-defined Tools: Allowing users to register their own custom functions or scripts that the AI can invoke, extending Variantlab's capabilities.

Built With

Share this project:

Updates