Inspiration
Why we built Vocode
The day before this hackathon, I cut my finger badly enough that I literally couldn’t type.
For most people, that’s just an inconvenience. But as a programmer, it completely blocked my ability to build. I found myself thinking: why is programming still so dependent on a keyboard?
That question became more meaningful when I thought about my uncle. He spent over 20 years at Microsoft as a programmer, but now lives with neuropathy. Typing for extended periods causes him real pain—making it difficult to continue doing something he loves.
That’s when the idea for Vocode clicked.
What if programming didn’t require typing at all—and even better, what if you could code faster than typing?
What it does
What is Vocode?
Vocode is a voice-driven, agentic code editor that lets you write, navigate, and edit code, ask questions, and manage filesystem using natural speech—at the speed of conversation.
Instead of manually typing syntax, you can say:
“Go to the enemy loop. Add a bounds check before accessing the array.”
Vocode understands your intent and applies precise edits directly to your codebase.
But what makes Vocode fundamentally different is that it’s not vibecoding.
Most AI coding tools generate large chunks of code for users who may not fully understand or control the result. That approach is powerful—but it takes the developer out of the driver’s seat.
Vocode takes the opposite approach:
- You stay in control of the architecture and decisions
- The AI executes small, scoped edits based on your intent
- Each interaction is precise, composable, and reversible
The workflow becomes:
- Resolve a scope (“go to the enemy loop”)
- Apply a change within that scope (“add a bounds check”)
Smaller, structured edits enable more precise expression than large code generation ever could.
The result is a new paradigm:
- Not typing code
- Not generating code blindly
- But directing code at the speed of speech
This is vocoding
How we built it
Vocode is built as a modular, multi-process system designed for real-time voice interaction with code.
Core stack:
- Monorepo powered by a pnpm turborepo with Biome and GitHub CI
- Frontend built as a VS Code extension using TypeScript and React (with support for additional frontends in the future)
- Landing page built with React, Vite, and Tailwind hosted on Vercel
Voice pipeline:
- A dedicated voice daemon written in Go handles microphone input
- Uses cgo with PortAudio for low-level audio capture
- Custom Voice Activity Detection (VAD) determines when the user is speaking
- Audio is streamed to ElevenLabs Scribe v2 for real-time speech-to-text
AI + orchestration:
- A separate core daemon (Go) manages AI interaction
- Supports multiple providers (e.g. OpenAI, Anthropic)
- Communication between components happens over duplex JSON-RPC
Edit system (core innovation): Vocode uses a two-step, scope-based editing model:
- Scope Resolution The user specifies where to operate
- “Find the main function”
- Project-wide search returns matches
- Scoped Modification The user specifies what to change
- “Make it do X”
A selection window shows all matches, and users can navigate between them using voice before applying changes.
This separation of where and what enables precise, composable edits—making voice-driven programming both fast and controlled.
Challenges we ran into
Building a real-time, voice-driven coding system introduced challenges across multiple layers:
Coordinating distributed components Managing communication between the VS Code extension, voice daemon, and core AI daemon—especially over JSON-RPC—required careful orchestration and debugging.
Designing a robust intent system Not all speech is equal. We had to distinguish between:
- Code edits
- Navigation commands
- File/text search
- General questions to the AI
- UI control actions
Handling these different intent flows reliably—while keeping the experience seamless—was a major challenge.
- Dealing with noisy and ambiguous input Speech transcripts are often imperfect. We had to handle irrelevant or partial input while still extracting meaningful intent.
- Low-level audio processing Implementing microphone input via PortAudio (through cgo), building a custom VAD system, and streaming audio efficiently to ElevenLabs required working close to the hardware layer
Accomplishments that we're proud of
- Built a working speech → intent → structured code edit pipeline
- Demonstrated real-time coding at the speed of speech, not just transcription (*api latency still exists)
- Created a system where developers remain fully in control, not replaced by AI
- Designed a scope-based editing model that enables precise, composable changes Introduced a new paradigm beyond “vibecoding” ## What we learned
- Translating natural language into precise, structured code edits is far more complex than generating code
- Separating scope (where) from intent (what) dramatically improves reliability and usability
- Voice interfaces require fundamentally different UX patterns than traditional developer tools
- Accessibility-driven ideas can lead to fundamentally better tools for everyone
- Building at both the systems level (audio, daemons) and AI level (intent interpretation) requires careful boundary design ## What's next for Vocode Vocode introduces a new paradigm: intent-driven programming.
We’re not building a better autocomplete—we’re redefining how developers interact with code.
Expanding the intent system
- More expressive and reliable scoped operations
Deep codebase awareness
Improved Conversation
- Seamless and fluid real-time interaction and conversation with your codebase
Advanced multi-step edits and refactoring
- Coordinated changes across files using structured intent
- More LSP-based tools such as "extract this into a reusable module"
Personalized coding agents
- Learning a developer’s style, patterns, and preferences
- Adapting to code style and project conventions
Multiple supported frontends
- Integrations for other IDES
- Vocode IDE
- Vocode Web
- Connect to vocode on your machine with Vocode Mobile
Long term, we believe Vocode defines a new category:
- Not vibecoding (large, opaque generation)
- Not traditional editing (manual typing + autocomplete)
- But intent-driven programming
A world where:
- You think in logic
- You speak your intent
- And your code evolves instantly
Code at the speed of speech—without giving up control.
Built With
- anthropic
- elevenlabs
- go
- openai
- portaudio
- react
- typescript
- vscode-extension
Log in or sign up for Devpost to join the conversation.