Sapiens Manifest transforms AI image generation from a static data-entry task into a dynamic, tactile performance. By fusing real-time voice synthesis with a virtual Logitech MX Creative Console interface, it allows creators to "conduct" generative AI like an instrument using their voice to paint the vision and a physical-style dial to tune the chaos. It’s not just a tool; it’s a jam session between human intent and synthetic imagination.

Inspiration

Sapiens Manifest was born from my desire to create an intuitive, voice-driven interface for AI image generation that mirrors the tactile precision of professional gear like the Logitech MX Creative Console.

As a creative technologist, I wanted to move away from the rigidness of typing prompts into text boxes. I asked myself: What if creating AI art felt less like data entry and more like playing a synthesizer? I wanted to explore how voice commands, combined with physical-inspired controls (dials, keypads), could turn the chaotic nature of generative AI into a fluid performance where I am the conductor, and the AI is the orchestra.

What it does

Sapiens Manifest is a virtual bridge to the MX Creative Console experience that transforms the browser into a tactile canvas. It allows me to:

πŸŽ™οΈ Generate images using voice (powered by ElevenLabs), turning speech into visual concepts in real-time.

πŸŽ›οΈ Tune creativity parameters with a tactile-inspired dial (controlling the "Chaos" / Temperature from 0 to 100%).

πŸ“· Instant-switch camera styles (Wide, Portrait, Macro, Aerial, Cinematic, Fisheye, Tilt-Shift, Vintage).

🎨 Toggle art styles on the fly (Realistic, Abstract, Anime, Oil, Watercolor, Pixel, Neon, Pencil).

⌨️ Execute pro-level workflows using specific keyboard shortcuts designed for power users.

πŸ”Š Feel the interface through synthesized audio feedback (Web Audio API) for every interaction, compensating for the lack of physical haptics in the web view.

How I built it

Next.js 14 + React served as the backbone for the frontend interface.

ElevenLabs API handles the real-time voice-to-prompt generation.

Web Audio API was engineered to provide satisfying, synth-like feedback sounds for UI interactions.

Convex manages the real-time database state and user credits.

Generative Backend: Currently testing with Gemini 3 for visual intelligence, while benchmarking against real-time generation models (like Decart or Krea) to achieve instant alignment.

CSS-in-JS was used to craft the skeuomorphic, high-fidelity console design.

I implemented custom rotary dial physics with drag-to-rotate interactions to mimic the resistance and precision of a physical knob.

Challenges I ran into

Since this is an MVP in the ideation phase, the technical hurdles were the most exciting part:

Sensory Synchronization: The "final boss" of this phase. Synchronizing voice input with visual feedback in real-time without lagging the UI proved complex.

The "Feel" Factor: Creating realistic audio feedback purely in the browser to replace physical tactile clicks required fine-tuning the Web Audio API to avoid it sounding "cheap."

UX Balance: The continuous challenge of designing a UI that conveys the weight and quality of physical hardware while remaining accessible and functional on a 2D screen.

Accomplishments that I'm proud of

Successfully achieving a premium "console-like" experience entirely within a web browser.

Validating that a conversational voice-to-image workflow feels infinitely more natural and creative than transactional typing.

Building a modular architecture of UI components (DialPad, KeyPad, StatusMonitor) that lays the groundwork for the future hardware integration.

What I learned

The critical role of audio cues in creating immersive UIs; when you can't feel the click, hearing it is the next best thing.

That the speed of the AI model defines the "musical instrument" feel of the tool; if it's too slow, the magic breaks.

That the interface needs to forgive mistakes and encourage improvisation (the maker culture).

What's next for Sapiens Manifest

The Hardware Bridge: Full integration with the actual Logitech MX Creative Console via the Logitech Plugin API, mapping the virtual dial to the physical hardware.

Mother Tongue Expansion: Expanding voice support beyond English and Portuguese. As English is my third language, I realize that "tuning" AI is a visceral experience that hits differently when done in one's native tongue. I want Sapiens Manifest to capture cultural nuance, not just literal translation, allowing creators to manifest in Portuguese, Spanish, and beyond.

Built With

  • convex
  • css-in-js
  • elevenlabs-api
  • gemini-3
  • next.js-14
  • react
  • web-audio-api
Share this project:

Updates