结合自注意力和归一化的MAC_BiLSTM文本分类模型
MAC_BiLSTM Text Classification Model Based on Self-Attention and Normalization
摘要: 针对文本分类任务中关键特征分布不均匀和双向长短期记忆网络(BiLSTM)局部特征信息提取不足的问题,提出了一种基于自注意力机制(Self_Attention)和归一化的多通道MAC_BiLSTM文本分类模型。在双向长短期记忆网络层之后加入自注意力机制并进行层归一化,同时将BiLSTM通道的信息与最初的词向量信息融合,输入卷积通道,再分别采用自注意力赋予词卷积方式重新计算后信息的词权重,并进行批归一化,重复两次之后再进行池化,最终将CNN通道池化后的特征信息与BiLSTM通道信息进行特征融合,并通过Softmax分类器得出分类结论。在模型的设计环境中,模型使用了更加平滑的Mish激活函数代替Relu,通过和其他深度学习模型在多个数据集上的比较,结果表明,所提出的模型与其他模型相比具有更好的分类效果。
Abstract: To address the problem of uneven distribution of key features in traditional text classification tasks and insufficient local feature extraction ability of Bi-directional Long Short-Term Memory (BiLSTM), a multi-channel text classification model based on self-attention mechanism and normalization is proposed. Self-attention mechanism and layer normalization are added after BiLSTM layer. The BiLSTM channel information and the initial word vector information are fused to input into the convolution channel. The weight of information after the convolution operation is given by self-attention, and the batch normalization is carried out. After repeated twice, the pooling is car-ried out. Then the feature information after max pooling and BiLSTM channel information are fused to input Softmax, and obtain the classification results. In the process of model operation, the pro-posed model uses smoother Mish activation function instead of Relu. Through comparison experi-ments with other deep learning models on multiple datasets, the results show that the proposed model has better classification accuracy than other models and has better classification perfor-mance.
文章引用:原明君, 江开忠, 杨洋, 惠岚昕. 结合自注意力和归一化的MAC_BiLSTM文本分类模型[J]. 应用数学进展, 2022, 11(10): 7012-7025. https://doi.org/10.12677/AAM.2022.1110744

参考文献

[1] Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
[2] Schuster, M. and Paliwal, K.K. (2002) Bidirectional Recurrent-neural Networks. IEEE Transactions on Signal Processing, 45, 2673-2681. [Google Scholar] [CrossRef
[3] Kim, Y. (2014) Convolutional Neural Networks for Sentence Classifica-tion. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, 25-29 October 2014, 1746-1751. [Google Scholar] [CrossRef
[4] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al. (2017) Attention Is All You Need. Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 5998-6008.
[5] Bengio, Y., Ducharme, R., Vincent, P. and Jauvin, C. (2003) A Neural Probabilistic Language Model. The Journal of Machine Learning Research, 3, 1137-1155.
[6] Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. Computer Science. arXiv: 1301.3781.
[7] Zhang, J., Li, Y., Tian, J. and Li, T. (2018) LSTM-CNN Hybrid Model for Text Classification. 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, 12-14 October 2018, 1675-1680. [Google Scholar] [CrossRef
[8] 吴汉瑜, 严江, 黄少滨, 李熔盛, 姜梦奇. 用于文本分类的CNN_BiLSTM_Attention混合模型[J]. 计算机科学, 2020, 47(z2): 23-27+34.
[9] 梁顺攀, 豆明明, 于洪涛, 郑智中. 基于混合神经网络的文本分类方法[J]. 计算机工程与设计, 2022, 43(2): 573-579.
[10] 张小川, 刘连喜, 戴旭尧, 刘璐. 基于词性特征的 CNN_BiGRU文本分类模型[J]. 计算机应用与软件, 2021, 38(11): 155-161.
[11] 陶志勇, 李小兵, 刘影, 刘晓芳. 基于双向长短时记忆网络的改进注意力短文本分类方法[J]. 数据分析与知识发现, 2019, 3(12): 21-29.
[12] 蒲相忠, 梁春燕, 李鑫鑫, 赵磊, 王栋. 基于Self-Attention的多语言语义角色标注联合学习方法[J]. 计算机应用与软件, 2021, 38(12): 174-178.
[13] 邓朝阳, 仲国强, 王栋. 基于注意力门控图神经网络的文本分类[J]. 计算机科学, 2022, 49(6): 326-334.
[14] 陈农田, 李俊辉, 满永政. 基于改进CNN-BiGRU-att模型的文本分类研究[J/OL]. 昆明理工大学学报(自然科学版), 2022, 47(1): 30-37. 2021-09-28. [Google Scholar] [CrossRef
[15] 陈可嘉, 刘惠. 基于改进BiGRU-CNN的中文文本分类方法[J/OL]. 计算机工程, 2022, 48(5): 59-66+73. 2021-12-11. [Google Scholar] [CrossRef
[16] Hinton, G.E., Ba, J.L. and Kiros, J.R. (2016) Layer Normalization. arXiv Preprint, arXiv: 1607.06450.
[17] Ioffe, S. and Szegedy, C. (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv Preprint arXiv: 1502.03167.
[18] Diganta, M. (2020) Mish: A Self Regularized Non-Monotonic Neural Activation Function. arXiv Preprint, arXiv: 1908.08681.
https://arxiv.org/pdf/1908.08681.pdf
[19] THUCTC: 一个高效的中文文本分类工具包[OL]. http://thuctc.thunlp.org/, 2020-11-11.