基于LSTM/GCN的在线学习文本特征提取方法
Online Learning Text Feature Extraction Method Based on LSTM/GCN
DOI: 10.12677/CSA.2021.113079, PDF,  被引量    国家自然科学基金支持
作者: 温创斐, 曾 安:广东工业大学计算机学院,广东 广州;潘 丹*:广东技术师范大学电子与信息学院,广东 广州
关键词: 特征提取学习分析图神经网络Feature Extraction Learning Analytics Graph Neural Network
摘要: 传统方法对在线学习文本进行特征筛选往往费时费力且迁移性较差。针对这一问题,根据在线学习文本短,专业词汇多,文本间结构信息丰富等特点,提出基于LSTM/GCN对Doc2Vec所得文本向量中文本–文本关系进行强化的文本嵌入方法,以解决传统方法中文本在投影到嵌入空间后结构信息丢失的问题。并提出指标MeanRank用于量化文本向量中结构信息的留存情况。实验结果表明,方法在指标MeanRank和文本分类精度上优于传统方法。可视化结果表明,增加结构向量使得文本向量在课程内部具有一致连贯性,在课程间更有区分度。
Abstract: Traditional methods for feature filtering of online learning text are often time-consuming and poorly migratory. To address this problem, and based on the characteristics of short texts of online learning text , many specialized vocabularies, and rich structural information between text, an end-to-end text feature extraction method is proposed. The method emphasizes the text-text relationship based on LSTM/GCN by obtaining the text vector based on Doc2Vec model to solve the phenomenon that the traditional method text loses structural information after projection to the embedding space. And the metric MeanRank is proposed to quantify the retention of structural information in the text vector. Experimental results on the Yale Open Course dataset show that the method outperforms traditional methods in terms of metrics MeanRank and text classification accuracy. Visualization of t-distributed stochastic neighbor embedding of text vectors shows that adding structural vectors makes text vectors consistently coherent within courses and more dis-criminative between courses.
文章引用:温创斐, 曾安, 潘丹. 基于LSTM/GCN的在线学习文本特征提取方法[J]. 计算机科学与应用, 2021, 11(3): 770-781. https://doi.org/10.12677/CSA.2021.113079

参考文献

[1] Chatti, M.A., Dyckhoff, A.L., Schroeder, U., et al. (2012) A Reference Model for Learning Analytics. International Journal of Technology Enhanced Learning, 4, 318-331. [Google Scholar] [CrossRef
[2] Yao, L., Mao, C. and Luo, Y. (2019) Graph Convolutional Networks for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 7370-7377. [Google Scholar] [CrossRef
[3] Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
[4] Bruna, J., Zaremba, W., Szlam, A., et al. (2013) Spectral Net-works and Locally Connected Networks on Graphs.
[5] Kenter, T., Borisov, A. and De Rijke, M. (2016) Siamese CBOW: Optimizing Word Embeddings for Sentence Representations. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Volume 1, 941-951. [Google Scholar] [CrossRef
[6] Arora, S., Liang, Y. and Ma, T. (2016) A Simple but Tough-to-Beat Baseline for Sentence Embeddings. ICLR 2017, Toulon, 24-26 April 2017, 1-16.
[7] Mikolov, T., Chen, K., Corrado, G., et al. (2013) Efficient Estimation of Word Representa-tions in Vector Space.
[8] Le, Q. and Mikolov, T. (2014) Distributed Representations of Sentences and Documents. International Conference on Machine Learning, Vol. 32, 1188-1196.
[9] Pagliardini, M., Gupta, P. and Jaggi, M. (2017) Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech-nologies, Volume 1, 528-540. [Google Scholar] [CrossRef
[10] Chen, M. (2017) Efficient Vector Rep-resentation for Documents through Corruption.
[11] Vo, A.-D., Nguyen, Q.-P. and Ock, C.-Y. (2020) Semantic and Syntactic Analysis in Learning Representation Based on a Sentiment Analysis Model. Applied Intelligence, 50, 663-680. [Google Scholar] [CrossRef
[12] Logeswaran, L. and Lee, H. (2018) An Efficient Framework for Learning Sentence Representations.
[13] Reimers, N. and Gurevych, I. (2019) Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Pro-cessing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, November 2019, 3982-3992. [Google Scholar] [CrossRef
[14] Angelova, R. and Weikum, G. (2006) Graph-Based Text Classification: Learn from Your Neighbors. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, 6-11 August 2006, 485-492. [Google Scholar] [CrossRef
[15] Liu, T., Yu, S., Xu, B., et al. (2018) Recurrent Networks with At-tention and Convolutional Networks for Sentence Representation and Classification. Applied Intelligence, 48, 3797-3806. [Google Scholar] [CrossRef
[16] 曾碧卿, 韩旭丽, 王盛玉, 等. 基于双注意力卷积神经网络模型的情感分析研究[J]. 广东工业大学学报, 2019, 36(4): 10-17.
[17] Velickovic, P., Cucurull, G., Casanova, A., et al. (2017) Graph Attention Networks.