About Vibe Tracker
What Inspired Me
I've always been drawn to music creation, particularly the raw, chaotic energy of chiptune and primitive electronic sounds. A few years ago, I was experimenting with MilkyTracker and learning DAW basics, but I kept hitting the same wall: by the time I figured out how to translate a musical idea into actual sound, the inspiration had already faded away.
This frustration stayed with me as life got busier and music creation took a backseat. I wanted something more immediate - a way to capture musical ideas before they slipped away, without getting bogged down in technical complexity.
When powerful AI models like GPT-OSS became more accessible with generous free tiers, I saw an opportunity. I'd been experimenting with these models for small tasks and wondered: could AI handle the creative challenge of generating tracker-style music from simple text descriptions?
What I Learned
Building Vibe Tracker taught me several key lessons:
- Real-time audio programming is incredibly demanding - every millisecond matters when processing audio buffers
- AI prompt engineering for structured output requires a careful balance between creativity and consistency
- Vectorized audio processing can achieve 5-8x performance improvements over naive implementations
- Multiple AI provider integration provides crucial reliability through automatic fallbacks
- Terminal UIs can be surprisingly powerful for creative applications when designed thoughtfully
How I Built It
The project evolved through several key phases:
1. Core Audio Engine
- Built a real-time audio sequencer using Python's
sounddevicelibrary - Implemented vectorized synthesis for sine, square, sawtooth, and triangle waveforms
- Created an ADSR envelope system for natural-sounding notes
2. AI Integration
- Developed a multi-provider system supporting both Hugging Face GPT-OSS and Google Gemini
- Engineered prompts to generate structured JSON for instruments, patterns, and effects
- Implemented automatic provider fallback for reliability
3. Effects System
- Built instrument-level reverb effects with configurable parameters
- Optimized processing to under 2ms per audio buffer for real-time performance
- Integrated effects generation into the AI workflow
4. User Interface
- Created a terminal-based interface using Textual framework
- Implemented real-time pattern visualization and instrument management
- Added keyboard shortcuts for common operations (play/pause, save, export)
Challenges I Faced
Performance Optimization
The biggest challenge was achieving real-time audio performance. Initial implementations had severe lag due to inefficient sample generation and debug logging in audio callbacks. I solved this through:
- Replacing per-sample loops with numpy vectorized operations
- Eliminating all logging from real-time audio paths
- Implementing efficient memory management with buffer reuse
AI Consistency
Getting AI models to generate valid, musically coherent JSON was tricky. Different providers had varying strengths - GPT-OSS was lacking creative possibilities with an overloaded system prompt and vague queries. I addressed this with:
- Carefully crafted system prompts with examples
- Robust JSON parsing with error recovery
- Provider-specific optimizations
Multi-track Synchronization
Handling multiple instruments playing simultaneously without timing issues required:
- Precise event scheduling using frame-accurate timing
- Careful management of note-on/off events to prevent accumulation
- Thread-safe communication between UI and audio threads
Audio Quality
Achieving clean, professional-sounding output meant:
- Implementing anti-aliasing for waveform generation using PolyBLEP
- Adding proper filtering and effects processing
- Balancing multiple audio sources without clipping
The result is a tool that solves my original problem: capturing musical inspiration immediately through natural language, without technical barriers getting in the way.
Log in or sign up for Devpost to join the conversation.