Audiate

About Our Project & Significance

Music has always been more than sound. It is emotion made tangible, rhythm felt in the body, beauty that moves people even when words cannot. Yet for millions of people...those who are deaf or hard of hearing, those without access to instruments or training, those who experience music differently-- that beauty has always lived behind a wall.

Audiate tears that wall down.

We believe music should be seen, felt, and experienced by everyone. Audiate takes a hummed melody and transforms it into something you can watch and touch: a hand-drawn animated musician performing exactly what you sang, and a haptic pulse that lets you feel the rhythm through the device in your hand.

By making music a multimodal experience across sound, sight, and touch, Audiate opens the door for people who have never been able to fully access music before. It also reframes what music software can be: not a tool for experts, but a canvas for anyone with something to express.

Process

Pitch Detection

Pitch detection is handled by CREPE (Convolutional REpresentation for Pitch Estimation), a deep learning model trained on a large dataset of monophonic audio. CREPE processes the incoming audio and outputs a time-series of fundamental frequency (Hz) values at approximately 100 frames per second, along with a per-frame confidence score.

Raw pitch output is passed through Viterbi decoding, which smooths the frequency curve by resolving ambiguous frame-to-frame jumps into the most probable pitch sequence globally. Low-confidence frames (below a tunable threshold) are zeroed out to suppress breath noise and unvoiced segments. Each remaining Hz value is converted to the nearest MIDI semitone using the standard formula:

MIDI = 69 + 12 × log₂(f / 440)

This gives us a clean, discrete representation of which note is being hummed at any given moment.

Note Onset Detection

While CREPE tells us what pitch is present, it does not tell us when a new note begins. Onset detection is handled by librosa, using a spectral flux onset strength function that measures frame-to-frame changes in the audio spectrum. Peaks in this onset envelope correspond to note attacks — the moment a new note is struck or sung.

Onset times are backtracked to the nearest local energy minimum, aligning each detected boundary with the true start of the note rather than its peak energy. A configurable sensitivity threshold controls how aggressively onsets are detected, allowing tuning for different humming styles and recording conditions.

The CREPE pitch series is then sliced at each onset boundary. Within each segment, the median Hz value of all voiced frames is taken as the representative pitch, making the system robust to within-note vibrato or slight pitch drift. BPM is estimated from the average inter-onset interval. The final output is a structured list of events:

{
  "note_name": "A4",
  "midi": 69,
  "hz": 440.0,
  "start_time": 2.03,
  "duration": 0.52,
  "confidence": 0.96,
  "is_rest": false
}

This JSON is the bridge between the audio pipeline and the animation engine.

Haptics

Audiate includes a haptic feedback layer that translates the detected musical rhythm directly into physical vibration. The vibration rate is modulated by the tempo and note density of the detected melody, faster passages produce faster haptic pulses, sustained notes produce slower, longer vibrations, and rests produce silence.

This feature is fully implemented on macOS using Core Haptics-compatible hardware. iOS implementation is in progress, leveraging the Taptic Engine via the Core Haptics framework (CHHapticEngine), which supports fine-grained control over haptic intensity and sharpness per event. The haptic timeline is generated directly from the same note event JSON used by the animator, meaning haptics, visuals, and audio are always perfectly synchronised.

What Makes Us Unique

Audiate is built around a single, radical idea: music is not just something you hear. It is something you see and feel.

Most music tools optimize for audio output. Audiate treats audio as the input and makes sight and touch the outputs. This inversion is deliberate. It means someone who cannot hear can still experience a melody, watching it performed, feeling it pulse in their hand. It means a child who has never studied music can hum something and immediately see it brought to life. It means music becomes a shared, embodied experience rather than a purely acoustic one. Music made visible. The animation is not a gimmick, it is the core product. Every note the user hums is rendered as a real violin performance, with correct string selection, finger position, and bow direction. The sketchy, organic visual style is intentional: it feels alive, warm, and human rather than mechanical, because music is a human thing. Music made tangible. The haptic layer makes rhythm physical. Faster passages vibrate faster. Sustained notes pulse slowly. Rests fall silent. The device stops being a screen and starts being an instrument the body can feel. This is not an accessibility feature bolted on at the end, it is a core output, built in from the start and already running on Mac hardware. One voice, everything else follows. Audiate's entire pipeline starts from a hum the most natural, instrument-free act of musical expression there is. No keyboard. No notation. No training required. If you can hum it, Audiate can show it and let you feel it.

Two specific design decisions set Audiate apart:

Single source of truth. The note event JSON produced by the audio pipeline drives the animation, the haptics, and any future output (sheet music, MIDI export, audio synthesis) simultaneously. Adding a new output modality means consuming the same JSON, not rewriting the pipeline.

Haptics as a first-class output. Most music apps treat haptics as a notification afterthought. In Audiate, haptics are a musical output; the device becomes a physical instrument that pulses with the melody, opening the experience to users with hearing impairments and adding a tactile dimension for everyone else.

Built With Omnara

Omnara was integral to how we built Audiate. As a Claude-powered coding environment, it kept our work consistent across sessions, with full awareness of what had already been built and why.

Beyond continuity, Omnara gave us a bigger and cleaner workspace while maintaining direct access to the code at all times. Rather than jumping between a chat window and an editor, or communicating with an AI agent in a small terminal where it was hard to backread and keep track of our changes, everything lived in one place with Omnara. For a project with as many interlocking pieces as Audiate, with a Python audio pipeline, a Flask backend, a Swift haptic binary, and a frontend, this convenience was what made the build possible in the time we had.

Future Expansions & Applications

MIDI and sheet music export. The structured note event JSON is one mapping step away from standard MIDI. From MIDI, automatic sheet music generation via LilyPond or MuseScore is straightforward. This would allow Audiate to serve as a rapid melody sketchpad for composers.

Multi-instrument animation. The current violinist character is one rendering target. The same note event format can drive a guitarist, pianist, or drummer with instrument selection based on the pitch range or timbre of the input.

HumTrans fine-tuning. CREPE is pretrained on general monophonic audio. Fine-tuning on the HumTrans dataset, a comprehensive collection of hummed melodies with ground-truth pitch annotations that would meaningfully improve accuracy on the specific input distribution Audiate targets.

Real-time mode. The current pipeline is offline (record or upload → process → animate). A streaming version using CREPE's frame-by-frame inference and a ring buffer for onset detection would enable live animation as the user hums, with sub-200ms latency.

Accessibility. The haptic output layer makes melody creation accessible to users with hearing impairments. Combined with visual animation, Audiate offers a multimodal musical experience that does not depend on any single sensory channel. We hope to also make it accessible on different devices as well, such as an Apple or android phone.