Speech emotion Recogntion

Code Work

The recognition of the emotional state of the speaker is aresearch area that has received great interest in the last years. The maingoal is to improve voiced-based human-machine interactions. Most of therecent research on this domain has focused the studies in the prosodicfeatures and the speech signal spectrum characteristics. However, thereare many other characteristics and techniques which have not been ex-plored in emotion recognition systems. In this work, a study of the per-formance of Gaussian mixtures models and hidden Markov models ispresented. For the hidden Markov models, several conﬁgurations havebeen used, including an analysis of the optimal number of states. Re-sults show the inﬂuence of number of Gaussian components and states.The performance of the classiﬁers has been evaluated with 3 to 7 emo-tions in spontaneous emotional speech and with speaker independence.In the analysis of three emotions: neutral, sadness and anger, the recog-nition rate by the Gaussian mixture classiﬁers was 93% and with hiddenMarkov models it was 97%. In the recognition of seven emotions, theaccuracy was 67% with the Gaussian mixtures models and 76% in theevaluation of hidden Markov models

Detecting emotions is one of the most important marketing strategies in today’s world. You could personalize different things for an individual specifically to suit their interest. For this reason, we decided to do a project where we could detect a person’s emotions just by their voice which will let us manage many AI-related applications. Some examples could be including call centres to play music when one is angry on the call. Another could be a smart car slowing down when one is angry or fearful. As a result, this type of application has much potential in the world that would benefit companies and even safety to consumers.

Problem:

Goal is to use a model-based approach for speech emotion recognition in real time environment. Focuses on emotion detection through acted scenarios although people expresses their emotions by voice naturally. Programmed to work on pipelined architecture and parallel processing.

OBJECTIVES:

Speech Emotion Recognition is to improve human-machine interface. It aims to track a person’s psychological status. To recognize an emotion of a person precisely under noisy conditions. Estimating the speech features of noise-corrupted emotional speeches reliably. For making it essential to take rational as well as intelligent decisions.

Use Cases:

Asses Individual state with past and current stage It can be used for Psychiatric diagnosis, automated call centers, assessing driver’s mental state

Challenges:

Datasets are recorded in silent labs but in real time the voice recorded seems to be noisy. Accuracy is generally affected by extraction and selection of efficient feature Getting the accurate emotion from a voice was a challenging task.

UNIQUENESS OF THE SOLUTION

Emotions are extremely important in human mental health. It is a means of communicating one's point of view or mental state to others. Speech Emotion Recognition (SER) is the extraction of the speaker's emotional state from his or her speech signal. Emotion-sensing technologies can assist employees in making better decisions, improving their focus and performance at work, managing stress, and adopting healthier and more productive working styles.

Explanation of above code:

At first loading the data from a folder which can be done using python library glob and getting base name using os library as we know RAVDEES dataset is made such a way that emotion on 2nd base so declaring X for feature and y for emotion. X is obtained from “extract_feature” and y obtained using “basename” after splitting the base name and we know the “basename” refers to what emotion from above int2emotion. Our next step had to make the emotion classifier i.e. model for that we used Multilayer perceptron classifier (MLP classifier)