Inspiration

As a DJ specialising in rap and Afrobeats with coursework in African Popular Music, I've always been curious about what makes genres musically distinct. I wanted to explore whether deep learning models can capture the subtle differences between hybrid genres like Afrobeats, which blends West African rhythms with hip-hop, and understand what audio features they actually learn.

What it does

The project implements and compares three deep learning architectures (CNNs on mel-spectrograms, RNNs on MFCCs, and hybrid models) to classify music into genres. Beyond accuracy metrics, it uses visualization techniques like Grad-CAM and t-SNE to reveal which musical features (frequency patterns, temporal structures) the model focuses on, and provides musicological analysis of why certain genres get confused.

How we built it

Using PyTorch, I built models trained on the FMA-small dataset (8,000 tracks, 8 genres). I converted audio to mel-spectrograms using librosa, implemented ResNet/VGG-based CNNs and bidirectional LSTMs, and conducted extensive ablation studies on audio representations, data augmentation (time stretching, pitch shifting, SpecAugment), and training techniques. I tracked experiments with Weights & Biases and created visualizations to interpret model decisions.

Challenges we ran into

Audio processing is computationally expensive, spectrograms consumed significant GPU memory, limiting batch sizes. Genre labels are inherently subjective, especially for hybrid genres. GTZAN's small size led to overfitting despite aggressive regularization. Balancing technical depth with musical insight required careful failure case analysis. Managing training time across multiple architectures required prioritizing experiments strategically.

Accomplishments that we're proud of

Successfully bridging machine learning and musicology by providing domain-informed analysis of model behavior. Implementing comprehensive ablation studies that reveal which design choices matter most. Creating interpretable visualizations that explain classification decisions from a musical perspective. Building a reproducible codebase with clean documentation that others can extend.

What we learned

Deep learning on audio requires different considerations than vision tasks. Data augmentation is crucial for small audio datasets. Traditional metrics don't capture musically meaningful performance. Genre classification reveals as much about the subjectivity of genre labels as it does about model capabilities. Domain expertise significantly enriches technical analysis.

What's next for Music Genre Classification

Expand to larger datasets with more diverse genres, particularly underrepresented styles like Afrobeats, Amapiano, and regional hip-hop subgenres. Implement transformer-based architectures and self-supervised learning approaches. Build a real-time genre classifier for DJ applications. Explore multi-label classification since many songs span multiple genres. Investigate bias in genre labeling and develop more culturally-aware classification approaches.

Built With

Share this project:

Updates