|
[1]
|
Eronen, A.J., Peltonen, V.T., Tuomi, J.T., et al. (2006) Audio-Based Context Recognition. IEEE Transactions on Audio Speech and Language Processing, 14, 321-329. [Google Scholar] [CrossRef]
|
|
[2]
|
Ma, L., Milner, B. and Smith D. (2006) Acoustic Environment Classification. ACM Transactions on Speech and Language Processing, 3, 1-22. [Google Scholar] [CrossRef]
|
|
[3]
|
Jiang, H., Bai, J., Zhang, S. and Xu, B. (2005) SVM-Based Audio Scene Classification. 2005 International Conference on Natural Language Processing and Knowledge Engineer-ing, Wuhan, 30 October-1 November 2005, 131-136.
|
|
[4]
|
Li, J., Dai, W., Metze, F., Qu, S. and Das, S. (2017) A Comparison of Deep Learning Methods for Environmental Sound Detection. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, 5-9 March 2017, 126-130. [Google Scholar] [CrossRef]
|
|
[5]
|
Paseddula, C. and Gangashetty, S.V. (2021) Late Fusion Framework for Acoustic Scene Classification Using LPCC, SCMC, and Log-Mel Band Energies with Deep Neural Networks. Applied Acoustics, 172, Article ID: 107568. [Google Scholar] [CrossRef]
|
|
[6]
|
Zhang, Z., Liu, D., Han, J., Qian, K. and Schuller, B.W. (2021) Learning Audio Sequence Representations for Acoustic Event Classification. Expert Systems with Applications, 178, Article ID: 115007. [Google Scholar] [CrossRef]
|
|
[7]
|
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M. and Luo, P. (2021) SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P.S. and Wortman Vaughan, J., Eds., Advances in Neural Information Processing Systems, Vol. 34, NeurIPS, New Orleans, 12077-12090.
|
|
[8]
|
Peng, J., Liu, Y., Tang, S., et al. (2022) PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model. ArXiv: 2204.02681.
|
|
[9]
|
Hershey, S., Chaudhuri, S., Ellis, D.P.W., et al. (2017) CNN Architectures for Large-Scale Audio Classification. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, 5-9 March 2017, 131-135. [Google Scholar] [CrossRef]
|
|
[10]
|
王猛, 张鹏远. 融合多尺度特征的短时音频场景识别方法[J]. 声学学报, 2022, 47(6): 717-726.
|
|
[11]
|
费鸿博, 吴伟官, 李平, 曹毅. 基于梅尔频谱分离和LSCNet的声学场景分类方法[J]. 哈尔滨工业大学学报, 2022, 54(5): 124-130+123.
|
|
[12]
|
Sifre, L. and Mallat, S. (2014) Rigid-Motion Scattering for Texture Classification. ArXiv: 1403.1687.
|
|
[13]
|
Chollet, F. (2017) Xception: Deep Learning with Depth-wise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 1800-1807. [Google Scholar] [CrossRef]
|
|
[14]
|
Howard, A.G., Zhu, M., Chen, B., et al. (2017) Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. ArXiv: 1704.04861.
|
|
[15]
|
Ren, Z., Kong, Q., Qian, K., Plumbley, M. D. and Schuller, B.W. (2018) Attention-Based Convolutional Neural Networks for Acoustic Scene Classi-fication. 3rd Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2018 Workshop), Surrey, 19-20 November 2018, 1-5.
|
|
[16]
|
Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef]
|
|
[17]
|
He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Ve-gas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef]
|
|
[18]
|
Mesaros, A., Heittola, T. and Virtanen, T. (2018) A Multi-Device Da-taset for Urban Acoustic Scene Classification. ArXiv: 1807.09840.
|