The aim of the project is about the detection of the emotions elicited by the speaker while talking. As an example, speech produced in a state of fear, anger, or joy becomes loud and fast, with a higher and wider range in pitch, whereas emotions such as sadness or tiredness generate slow and low-pitched speech. Detection of human emotions through voice-pattern and speech-pattern analysis has many applications such as better assisting human-machine interactions. In particular, we are presenting a classification model of emotion elicited by speeches based on deep neural networks (CNNs),SVM,MLP Classification based on acoustic features such as Mel Frequency Cepstral Coefficient (MFCC).The model has been trained to classify eight different emotions (neutral, calm, happy, sad, angry, fearful, disgust, surprise). Our evaluation shows that the proposed approach yields accuracies of 86%, 84% and 82% using CNN, MLP Classifier and SVM Classifiers, respectively, for 8 emotions using Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset and Toronto Emotional Speech Set (TESS) Dataset.
Please ignore if you have already signed up.
From leadingindia.ai in your inbox.
By submitting this form, you are consenting to receive marketing emails from: Bennett University. You can revoke your consent to receive emails at any time by using the SafeUnsubscribe® link, found at the bottom of every email.