Inspiration

We were frustrated with the limitations of current music systems - generic EQ presets that don't adapt to specific songs, cloud-dependent analysis that compromises privacy, and genre classification systems biased toward popular music. We envisioned an intelligent system that could understand music like a professional sound engineer and enhance it in real-time, all while running locally on efficient hardware.

What it does

SoundSage is a dual-AI system that instantly identifies music genres and automatically optimizes your audio experience. The first model analyzes audio features to detect the genre, while the second model intelligently adjusts EQ settings tailored to that specific genre. Everything runs locally on MemryX NPU hardware, delivering studio-quality audio enhancement without cloud dependency or privacy concerns.

How we built it

Data Pipeline:

  • Processed 22,000+ audio clips from the FMA dataset
  • Extracted 80+ audio features (MFCC, spectral, tempo, chroma)
  • Implemented strategic class balancing (8 genres × 1,000 samples each)
  • Used ANOVA F-test to select the 60 most relevant features

Model Development:

  • Built dual neural networks in PyTorch
  • Genre model: 60 → 256 → 128 → 64 → 8 architecture
  • EQ model: Genre-aware parameter optimization
  • Trained on Modal with GPU acceleration

Deployment:

  • Converted models to ONNX format
  • Optimized for MemryX NPU using DFP file format
  • Implemented real-time audio processing pipeline

Challenges we ran into

Data Imbalance: Original genre distribution was highly skewed (some genres had 7,000+ samples, others only 100). Our initial models were biased toward majority classes.

Feature Selection: Determining which of the 80+ audio features were most relevant for genre classification required extensive testing and ANOVA analysis.

Hardware Optimization: Converting our models to run efficiently on MemryX NPU with DFP format presented unexpected compatibility challenges.

Real-time Performance: Achieving <5ms inference time while maintaining accuracy required significant architectural optimization and pruning.

Accomplishments that we're proud of

  • World's First: Created the first real-time dual-model music intelligence system on NPU hardware
  • Perfect Balance: Successfully balanced 8 genres with 1,000 samples each, eliminating classification bias
  • Hardware Integration: Achieved seamless deployment on MemryX NPU with DFP format
  • Performance: Reached >85% genre accuracy with <5ms latency
  • Privacy-First: Built a fully local system that processes audio without cloud dependency

What we learned

  • Balanced data beats complex models: Simple architectures with balanced data outperform complex models with imbalanced data
  • Feature selection is crucial: 60 well-chosen features outperformed 80+ generic features
  • Hardware-aware design: Model architecture must consider target deployment hardware from day one
  • Regularization diversity: Combining dropout, weight decay, and label smoothing provides robust generalization
  • Edge optimization: ONNX to DFP conversion requires careful attention to operator compatibility

What's next for SoundSage

Short-term (3-6 months):

  • Expand to 16+ genre classifications
  • Develop mobile SDK for Android and iOS
  • Integrate with popular music streaming apps

Medium-term (6-12 months):

  • Personalization engine that learns individual listening preferences
  • Multi-modal analysis combining audio with metadata and lyrics
  • Real-time audio effect chain beyond basic EQ

Long-term (12+ months):

  • AI-powered music composition assistance
  • Cross-platform plugin ecosystem
  • Enterprise solutions for broadcast and live sound
  • Open-source community edition for developers

We're just getting started on our mission to make intelligent audio enhancement accessible to everyone, everywhere.

Built With

Share this project:

Updates