SpeakEmotion: AI Speech Emotion Recognition

Inspiration

Emotion recognition from speech is crucial for mental health monitoring, accessibility, and human-computer interaction. Traditional neural networks use fixed architectures, but biological neurons adapt using dendrites. I was inspired by PerforatedAI's approach of adding artificial dendrites to neural networks - mimicking how the brain learns.

What it does

SpeakEmotion classifies 8 emotions (neutral, calm, happy, sad, angry, fearful, disgust, surprised) from speech audio using the RAVDESS dataset:

Converts audio to Mel spectrograms
Processes through a CNN with PerforatedAI dendrites
Dynamically grows new dendrites during training
Achieves 22.2% remaining error reduction over baseline

How I built it

PyTorch CNN for spectrogram classification
PerforatedAI for dendritic optimization with add_validation_score()
Weights & Biases for experiment tracking with Arch/Final logging
RAVDESS dataset (1,440 audio files from 24 actors)

Results

Model	Accuracy
Traditional CNN	53.45%
+1 Dendrite	59.48%
+2 Dendrites	63.79%

$$RER = \frac{63.79 - 53.45}{100 - 53.45} \times 100 = \textbf{22.2\%}$$

Challenges

Handling BatchNorm layers with PerforatedAI module conversion
Ensuring consistent spectrogram dimensions across variable-length audio
Implementing proper Arch/Final W&B logging per official example

What I learned

How dendritic optimization mimics biological neural plasticity
The power of dynamic architecture growth vs. fixed networks
Proper PerforatedAI integration patterns

What's next

Real-time emotion detection in video calls
Multi-modal emotion recognition (voice + facial expressions)
Apply to other audio tasks (speech recognition, music classification)

Built With

librosa
matplotlib
numpy
perforatedai
python
pytorch
ravdess
torchaudio
weights-&-biases

Updates

KAMALESH M CSE started this project — Jan 04, 2026 03:16 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.