|
[1]
|
耿磊, 傅洪亮, 陶华伟, 等. 基于动态卷积递归神经网络的语音情感识别[J]. 计算机工程, 2023, 49(4): 125-130. [Google Scholar] [CrossRef]
|
|
[2]
|
Tang, H., Zhang, X., Cheng, N., Xiao, J., Wang, J. (2024) ED-TTS: Multi-Scale Emotion Modeling Using Cross-Domain Emotion Diarization for Emotional Speech Synthesis. Seoul, 14-19 April 2024, 12146-12150. [Google Scholar] [CrossRef]
|
|
[3]
|
Zou, H., Si, Y., Chen, C., et al. (2022) Speech Emotion Recognition with Co-Attention Based Multi-Level Acoustic Information. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23-27 May 2022, 7367-7371. [Google Scholar] [CrossRef]
|
|
[4]
|
Kim, Y. (2014) Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, 25-29 October 2014, 1746-1751. [Google Scholar] [CrossRef]
|
|
[5]
|
Badshah, A.M., Rahim, N., Ullah, N., Ahmad, J., Muhammad, K., Lee, M.Y., Kwon, S. and Baik, S.W. (2017) Deep Features-Based Speech Emotion Recognition for Smart Affective Services. Multimedia Tools and Applications, 78, 5571-5589. [Google Scholar] [CrossRef]
|
|
[6]
|
Sak, H., Senior, A., Rao, K., et al. (2015) Learning Acoustic Frame Labeling for Speech Recognition with Recurrent Neural Networks. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, 19-24 April 2015, 4280-4284. [Google Scholar] [CrossRef]
|
|
[7]
|
Tao, F. and Liu, G. (2018) Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, 15-20 April 2018, 2906-2910. [Google Scholar] [CrossRef]
|
|
[8]
|
Moritz, N., Hori, T. and Roux, J.L. (2019) Triggered Attention for End-to-end Speech Recognition. Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12-17 May 2019, 5666-5670. [Google Scholar] [CrossRef]
|
|
[9]
|
Chiu, C.C., Sainath, T.N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R.J., Rao, K., Gonina, E., et al. (2018) State-of-the-Art Speech Recognition with Sequence-to-Sequence Models. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, 15-20 April 2018, 4774-4778. [Google Scholar] [CrossRef]
|
|
[10]
|
Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) Attention Is All You Need. Proceedings of the Neural Information Processing Systems, Long Beach, CA, 4-9 December 2017, 1-11.
|
|
[11]
|
Zhao, J., Mao, X. and Chen, L. (2019) Speech Emotion Recognition Using Deep 1D & 2D CNN LSTM Networks. Biomedical Signal Processing and Control, 47, 312-323. [Google Scholar] [CrossRef]
|
|
[12]
|
Sainath, T.N., Vinyals, O., Senior, A. and Sak, H. (2015) Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, 19-24 April 2015, 4580-4584. [Google Scholar] [CrossRef]
|
|
[13]
|
Chen, M. and Zhao, X. (2020) A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition. Proceedings of the Interspeech 2020, Shanghai, 25-29 October 2020, 374-378. [Google Scholar] [CrossRef]
|
|
[14]
|
Yu, W., Xu, H., Meng, F., et al. (2020) Ch-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-Grained Annotation of Modality. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistic, July 2020, 3718-3727. [Google Scholar] [CrossRef]
|
|
[15]
|
Zadeh, A., Zellers, R., Pincus, E., et al. (2016) Mosi: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos. arXiv:1606.06259.
|
|
[16]
|
Busso, C., Bulut, M., Lee, C.C., et al. (2008) IEMOCAP: Interactive Emotional Dyadic Motion Capture Database. Language Resources and Evaluation, 42, 335-359. [Google Scholar] [CrossRef]
|
|
[17]
|
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W. and Taylor, J.G. (2001) Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine, 18, 32-80. [Google Scholar] [CrossRef]
|
|
[18]
|
Latif, S., Rana, R., Khalifa, S., Jurdak, R. and Schuller, B. (2022) Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition. IEEE Trans. Affective Computing, 14, 1912-1926. [Google Scholar] [CrossRef]
|
|
[19]
|
Mustaqeem, Sajjad, M. and Kwon, S. (2020) Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM. IEEE Access, 8, 79861-79875. [Google Scholar] [CrossRef]
|