Deep Learning-Based Speech Emotion Recognition

International Journal of Multidisciplinary and Scientific Emerging Research 10 (2):715-718 (2022)
  Copy   BIBTEX

Abstract

Speech Emotion Recognition (SER) is an essential component in human-computer interaction, enabling systems to understand and respond to human emotions. Traditional emotion recognition methods often rely on handcrafted features, which can be limited in capturing the full complexity of emotional cues. In contrast, deep learning approaches, particularly convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory (LSTM) networks, offer more robust solutions by automatically learning hierarchical features from raw audio data. This paper reviews recent advancements in deep learning-based speech emotion recognition, discusses the various architectures used, and evaluates the challenges in real-world applications. We focus on the application of deep learning models to enhance the accuracy and robustness of SER, particularly in noisy environments. The study also discusses future directions for research, including multimodal emotion recognition and transfer learning to address challenges such as small datasets and cross-domain applications.

Analytics

Added to PP
2025-03-06

Downloads
87 (#103,191)

6 months
87 (#78,921)

Historical graph of downloads since first upload
This graph includes both downloads from PhilArchive and clicks on external links on PhilPapers.
How can I increase my downloads?