Sapiens Manifest transforms AI image generation from a static data-entry task into a dynamic, tactile performance. By fusing real-time voice synthesis with a virtual Logitech MX Creative Console interface, it allows creators to "conduct" generative AI like an instrument using their voice to paint the vision and a physical-style dial to tune the chaos. Itβs not just a tool; itβs a jam session between human intent and synthetic imagination.
Inspiration
Sapiens Manifest was born from my desire to create an intuitive, voice-driven interface for AI image generation that mirrors the tactile precision of professional gear like the Logitech MX Creative Console.
As a creative technologist, I wanted to move away from the rigidness of typing prompts into text boxes. I asked myself: What if creating AI art felt less like data entry and more like playing a synthesizer? I wanted to explore how voice commands, combined with physical-inspired controls (dials, keypads), could turn the chaotic nature of generative AI into a fluid performance where I am the conductor, and the AI is the orchestra.
What it does
Sapiens Manifest is a virtual bridge to the MX Creative Console experience that transforms the browser into a tactile canvas. It allows me to:
ποΈ Generate images using voice (powered by ElevenLabs), turning speech into visual concepts in real-time.
ποΈ Tune creativity parameters with a tactile-inspired dial (controlling the "Chaos" / Temperature from 0 to 100%).
π· Instant-switch camera styles (Wide, Portrait, Macro, Aerial, Cinematic, Fisheye, Tilt-Shift, Vintage).
π¨ Toggle art styles on the fly (Realistic, Abstract, Anime, Oil, Watercolor, Pixel, Neon, Pencil).
β¨οΈ Execute pro-level workflows using specific keyboard shortcuts designed for power users.
π Feel the interface through synthesized audio feedback (Web Audio API) for every interaction, compensating for the lack of physical haptics in the web view.
How I built it
Next.js 14 + React served as the backbone for the frontend interface.
ElevenLabs API handles the real-time voice-to-prompt generation.
Web Audio API was engineered to provide satisfying, synth-like feedback sounds for UI interactions.
Convex manages the real-time database state and user credits.
Generative Backend: Currently testing with Gemini 3 for visual intelligence, while benchmarking against real-time generation models (like Decart or Krea) to achieve instant alignment.
CSS-in-JS was used to craft the skeuomorphic, high-fidelity console design.
I implemented custom rotary dial physics with drag-to-rotate interactions to mimic the resistance and precision of a physical knob.
Challenges I ran into
Since this is an MVP in the ideation phase, the technical hurdles were the most exciting part:
Sensory Synchronization: The "final boss" of this phase. Synchronizing voice input with visual feedback in real-time without lagging the UI proved complex.
The "Feel" Factor: Creating realistic audio feedback purely in the browser to replace physical tactile clicks required fine-tuning the Web Audio API to avoid it sounding "cheap."
UX Balance: The continuous challenge of designing a UI that conveys the weight and quality of physical hardware while remaining accessible and functional on a 2D screen.
Accomplishments that I'm proud of
Successfully achieving a premium "console-like" experience entirely within a web browser.
Validating that a conversational voice-to-image workflow feels infinitely more natural and creative than transactional typing.
Building a modular architecture of UI components (DialPad, KeyPad, StatusMonitor) that lays the groundwork for the future hardware integration.
What I learned
The critical role of audio cues in creating immersive UIs; when you can't feel the click, hearing it is the next best thing.
That the speed of the AI model defines the "musical instrument" feel of the tool; if it's too slow, the magic breaks.
That the interface needs to forgive mistakes and encourage improvisation (the maker culture).
What's next for Sapiens Manifest
The Hardware Bridge: Full integration with the actual Logitech MX Creative Console via the Logitech Plugin API, mapping the virtual dial to the physical hardware.
Mother Tongue Expansion: Expanding voice support beyond English and Portuguese. As English is my third language, I realize that "tuning" AI is a visceral experience that hits differently when done in one's native tongue. I want Sapiens Manifest to capture cultural nuance, not just literal translation, allowing creators to manifest in Portuguese, Spanish, and beyond.
Built With
- convex
- css-in-js
- elevenlabs-api
- gemini-3
- next.js-14
- react
- web-audio-api


Log in or sign up for Devpost to join the conversation.