Sapiens Manifest: The Voice-Driven Chaos Tuner

Sapiens Manifest transforms AI image generation from a static data-entry task into a dynamic, tactile performance. By fusing real-time voice synthesis with a virtual Logitech MX Creative Console interface, it allows creators to "conduct" generative AI like an instrument using their voice to paint the vision and a physical-style dial to tune the chaos. It’s not just a tool; it’s a jam session between human intent and synthetic imagination.

Inspiration

Sapiens Manifest was born from my desire to create an intuitive, voice-driven interface for AI image generation that mirrors the tactile precision of professional gear like the Logitech MX Creative Console.

As a creative technologist, I wanted to move away from the rigidness of typing prompts into text boxes. I asked myself: What if creating AI art felt less like data entry and more like playing a synthesizer? I wanted to explore how voice commands, combined with physical-inspired controls (dials, keypads), could turn the chaotic nature of generative AI into a fluid performance where I am the conductor, and the AI is the orchestra.

What it does

Sapiens Manifest is a virtual bridge to the MX Creative Console experience that transforms the browser into a tactile canvas. It allows me to:

🎙️ Generate images using voice (powered by ElevenLabs), turning speech into visual concepts in real-time.

🎛️ Tune creativity parameters with a tactile-inspired dial (controlling the "Chaos" / Temperature from 0 to 100%).

📷 Instant-switch camera styles (Wide, Portrait, Macro, Aerial, Cinematic, Fisheye, Tilt-Shift, Vintage).

🎨 Toggle art styles on the fly (Realistic, Abstract, Anime, Oil, Watercolor, Pixel, Neon, Pencil).

⌨️ Execute pro-level workflows using specific keyboard shortcuts designed for power users.

🔊 Feel the interface through synthesized audio feedback (Web Audio API) for every interaction, compensating for the lack of physical haptics in the web view.

How I built it

Next.js 14 + React served as the backbone for the frontend interface.

ElevenLabs API handles the real-time voice-to-prompt generation.

Web Audio API was engineered to provide satisfying, synth-like feedback sounds for UI interactions.

Convex manages the real-time database state and user credits.

Generative Backend: Currently testing with Gemini 3 for visual intelligence, while benchmarking against real-time generation models (like Decart or Krea) to achieve instant alignment.

CSS-in-JS was used to craft the skeuomorphic, high-fidelity console design.

I implemented custom rotary dial physics with drag-to-rotate interactions to mimic the resistance and precision of a physical knob.

Challenges I ran into

Since this is an MVP in the ideation phase, the technical hurdles were the most exciting part:

Sensory Synchronization: The "final boss" of this phase. Synchronizing voice input with visual feedback in real-time without lagging the UI proved complex.

The "Feel" Factor: Creating realistic audio feedback purely in the browser to replace physical tactile clicks required fine-tuning the Web Audio API to avoid it sounding "cheap."

UX Balance: The continuous challenge of designing a UI that conveys the weight and quality of physical hardware while remaining accessible and functional on a 2D screen.

Accomplishments that I'm proud of

Successfully achieving a premium "console-like" experience entirely within a web browser.

Validating that a conversational voice-to-image workflow feels infinitely more natural and creative than transactional typing.

Building a modular architecture of UI components (DialPad, KeyPad, StatusMonitor) that lays the groundwork for the future hardware integration.

What I learned

The critical role of audio cues in creating immersive UIs; when you can't feel the click, hearing it is the next best thing.

That the speed of the AI model defines the "musical instrument" feel of the tool; if it's too slow, the magic breaks.

That the interface needs to forgive mistakes and encourage improvisation (the maker culture).

What's next for Sapiens Manifest

The Hardware Bridge: Full integration with the actual Logitech MX Creative Console via the Logitech Plugin API, mapping the virtual dial to the physical hardware.

Mother Tongue Expansion: Expanding voice support beyond English and Portuguese. As English is my third language, I realize that "tuning" AI is a visceral experience that hits differently when done in one's native tongue. I want Sapiens Manifest to capture cultural nuance, not just literal translation, allowing creators to manifest in Portuguese, Spanish, and beyond.

Built With

convex
css-in-js
elevenlabs-api
gemini-3
next.js-14
react
web-audio-api

Updates

Border Less started this project — Feb 10, 2026 10:12 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.