Speech Emotion Recognition using Convolutional Neural Network with Recurrent Neural Network Architecture, End to End architecture, Real time Recognition And Noise Robust approach for Recognition.
Abstract
This article reviews the different approaches that are available for the process of emotion recognition using speech. Firstly, we talk about the problems in emotion recognition such as choosing an appropriate speech database, then identifying various features and finally selecting an appropriate classification model. the model reviewed in this approach uses 13 MFCC and acceleration constants as features, a CNN and a Long Short Term Memory for classification. The second approach reviewed in this article uses an approach that is based on adaptively trained Very Deep Convolutional Residual network that highlights Cluster adaptive Training and Factor aware Training. The third approach overcomes the problems of losing some speech recognition information while extracting the feature first and then classifying the emotion hence resulting in reduced accuracy by using the end-to-end recognition approach. Finally it reviews the process of real-time speech recognition investigating high level descriptors and i-vectors.
Keywords: Neural networks, Speech Emotion Recognition, MFCC, CNN, LSTM, residual learning, high-level descriptors, i-vectors.