About the Project

Inspiration

One of the biggest drawbacks we noticed with traditional LLM-based systems is the need to constantly re-prompt, wait, and then check if the model understood the request correctly. This back-and-forth slows down creativity, especially when building visual elements like websites.

We wanted a solution where you could speak normally — without crafting perfect prompts — and immediately see the website take shape in real-time. That inspired Ducky: a voice-powered assistant that lets you build and customize websites through natural conversation, no delays, no friction.

What We Learned

  • How to integrate real-time speech recognition using the OpenAI API and natural language understanding with Google Gemini.
  • How to process messy, casual human speech into structured, reliable web-building commands.
  • How to create dynamic, real-time UI transformations in React, including advanced state management and live DOM updates.
  • How to coordinate a multi-service architecture across FastAPI, Express.js, and Next.js.
  • The importance of keeping backend services lightweight to support ultra-fast feedback loops.

How We Built It

  • Frontend: Built with React, Next.js, Tailwind CSS, and Shadcn UI, using the Web Speech API and OpenAI API for capturing and transcribing voice commands instantly.
  • Backend NLP: Developed a FastAPI server that uses Google Gemini API to analyze natural language and generate structured design instructions.
  • Real-Time Server: Created a Node.js Express server to manage live sessions, while Next.js server functions helped efficiently fetch and relay API data.

At the core of Ducky is the Ducky Engine — a custom-built system that listens for incoming design instructions, interprets user intent, and updates the React DOM in real-time. The engine dynamically adjusts page state across a wide range of styling attributes: layout, colors, spacing, typography, alignment, and more — all while applying smooth animations. As users speak, the site evolves live before their eyes, offering a natural, hands-free website creation experience.

Challenges

  • Understanding messy speech: Real users speak casually with filler words and unclear references. Making sure Ducky could still correctly understand and execute commands required strong NLP tuning.
  • Maintaining real-time performance: Updates had to happen instantly with no noticeable lag, which meant optimizing React rendering and backend response times.
  • Balancing customization and simplicity: We had to design Ducky to be powerful enough for detailed website building, but simple enough for anyone to use without memorizing strict commands.
  • Coordinating multiple servers: Keeping the FastAPI, Express, and Next.js services in sync while handling live voice streams was a major architectural challenge.

Built With

Share this project:

Updates