Inspiration

We had a working FPGA FFT core and wanted to build something real with it. The movie Whiplash gave us the concept — instead of a cheerful singing app, build one that gives brutally specific feedback in the voice of the most terrifying music teacher in cinema.

What it does

You sing into the FPGA's microphone. It computes a real-time FFT, streams the frequency data to Python, which compares your pitch note-by-note against a reference vocal track. Every 8 seconds, Gemini generates a Fletcher-style critique of your performance and ElevenLabs speaks it aloud.

How we built it

The Nexys A7 FPGA runs a 1024-point FFT on the onboard PDM mic and streams 512-byte magnitude frames over UART. Python reads the dominant frequency bin from each frame, segments frames into discrete notes, and compares them against a librosa-extracted reference timeline. Gemini generates the critique with a Fletcher persona. ElevenLabs voices it. A PyQt6 frontend shows live FFT and pitch charts alongside the final scored review.

Challenges we ran into

Getting the FPGA and Python timelines to align required reading the RTL source directly to derive the actual sample rate — our initial assumption was wrong by nearly 2x. We also over-engineered the UART protocol before realising the inter-burst gap makes framing unnecessary. Spectral centroid turned out to be useless for pitch tracking — switching to dominant bin extraction fixed it.

Accomplishments that we're proud of

The full pipeline works on real hardware with no mocking — FPGA mic to FFT to UART to pitch comparison to spoken critique, end to end. We're also proud of the note segmentation approach: comparing discrete musical phrases rather than individual frames makes the feedback feel like a real teacher, not a spectrometer.

What we learned

Read the hardware source code before assuming anything. The simplest protocol is often the most robust. Dominant bin beats spectral centroid for monophonic pitch. And a good persona makes a technically dense system immediately understandable to anyone who's seen the film.

What's next for Whiplash

Higher UART baud rate and smaller FFT windows to push frame rate from 3fps toward real-time. Scrolling sheet music display. Adaptive Fletcher — gentler early, increasingly brutal as you improve.

Built With

Share this project:

Updates