About Vibe Tracker

What Inspired Me

I've always been drawn to music creation, particularly the raw, chaotic energy of chiptune and primitive electronic sounds. A few years ago, I was experimenting with MilkyTracker and learning DAW basics, but I kept hitting the same wall: by the time I figured out how to translate a musical idea into actual sound, the inspiration had already faded away.

This frustration stayed with me as life got busier and music creation took a backseat. I wanted something more immediate - a way to capture musical ideas before they slipped away, without getting bogged down in technical complexity.

When powerful AI models like GPT-OSS became more accessible with generous free tiers, I saw an opportunity. I'd been experimenting with these models for small tasks and wondered: could AI handle the creative challenge of generating tracker-style music from simple text descriptions?

What I Learned

Building Vibe Tracker taught me several key lessons:

  • Real-time audio programming is incredibly demanding - every millisecond matters when processing audio buffers
  • AI prompt engineering for structured output requires a careful balance between creativity and consistency
  • Vectorized audio processing can achieve 5-8x performance improvements over naive implementations
  • Multiple AI provider integration provides crucial reliability through automatic fallbacks
  • Terminal UIs can be surprisingly powerful for creative applications when designed thoughtfully

How I Built It

The project evolved through several key phases:

1. Core Audio Engine

  • Built a real-time audio sequencer using Python's sounddevice library
  • Implemented vectorized synthesis for sine, square, sawtooth, and triangle waveforms
  • Created an ADSR envelope system for natural-sounding notes

2. AI Integration

  • Developed a multi-provider system supporting both Hugging Face GPT-OSS and Google Gemini
  • Engineered prompts to generate structured JSON for instruments, patterns, and effects
  • Implemented automatic provider fallback for reliability

3. Effects System

  • Built instrument-level reverb effects with configurable parameters
  • Optimized processing to under 2ms per audio buffer for real-time performance
  • Integrated effects generation into the AI workflow

4. User Interface

  • Created a terminal-based interface using Textual framework
  • Implemented real-time pattern visualization and instrument management
  • Added keyboard shortcuts for common operations (play/pause, save, export)

Challenges I Faced

Performance Optimization

The biggest challenge was achieving real-time audio performance. Initial implementations had severe lag due to inefficient sample generation and debug logging in audio callbacks. I solved this through:

  • Replacing per-sample loops with numpy vectorized operations
  • Eliminating all logging from real-time audio paths
  • Implementing efficient memory management with buffer reuse

AI Consistency

Getting AI models to generate valid, musically coherent JSON was tricky. Different providers had varying strengths - GPT-OSS was lacking creative possibilities with an overloaded system prompt and vague queries. I addressed this with:

  • Carefully crafted system prompts with examples
  • Robust JSON parsing with error recovery
  • Provider-specific optimizations

Multi-track Synchronization

Handling multiple instruments playing simultaneously without timing issues required:

  • Precise event scheduling using frame-accurate timing
  • Careful management of note-on/off events to prevent accumulation
  • Thread-safe communication between UI and audio threads

Audio Quality

Achieving clean, professional-sounding output meant:

  • Implementing anti-aliasing for waveform generation using PolyBLEP
  • Adding proper filtering and effects processing
  • Balancing multiple audio sources without clipping

The result is a tool that solves my original problem: capturing musical inspiration immediately through natural language, without technical barriers getting in the way.

Built With

  • claude-4-sonnet
  • gemini-2.5-flash
  • git
  • google-gemini-ai
  • gpt-5
  • hugging-face-api
  • numpy
  • openai-sdk
  • openrouter
  • portaudio
  • python
  • python-dotenv
  • requests
  • scipy
  • sounddevice
  • textual
  • windsurf
Share this project:

Updates