Inspiration
Emotion recognition from speech is crucial for mental health monitoring, accessibility, and human-computer interaction. Traditional neural networks use fixed architectures, but biological neurons adapt using dendrites. I was inspired by PerforatedAI's approach of adding artificial dendrites to neural networks - mimicking how the brain learns.
What it does
SpeakEmotion classifies 8 emotions (neutral, calm, happy, sad, angry, fearful, disgust, surprised) from speech audio using the RAVDESS dataset:
- Converts audio to Mel spectrograms
- Processes through a CNN with PerforatedAI dendrites
- Dynamically grows new dendrites during training
- Achieves 22.2% remaining error reduction over baseline
How I built it
- PyTorch CNN for spectrogram classification
- PerforatedAI for dendritic optimization with
add_validation_score() - Weights & Biases for experiment tracking with Arch/Final logging
- RAVDESS dataset (1,440 audio files from 24 actors)
Results
| Model | Accuracy |
|---|---|
| Traditional CNN | 53.45% |
| +1 Dendrite | 59.48% |
| +2 Dendrites | 63.79% |
$$RER = \frac{63.79 - 53.45}{100 - 53.45} \times 100 = \textbf{22.2\%}$$
Challenges
- Handling BatchNorm layers with PerforatedAI module conversion
- Ensuring consistent spectrogram dimensions across variable-length audio
- Implementing proper Arch/Final W&B logging per official example
What I learned
- How dendritic optimization mimics biological neural plasticity
- The power of dynamic architecture growth vs. fixed networks
- Proper PerforatedAI integration patterns
What's next
- Real-time emotion detection in video calls
- Multi-modal emotion recognition (voice + facial expressions)
- Apply to other audio tasks (speech recognition, music classification)
Log in or sign up for Devpost to join the conversation.