文章引用说明 更多>> (返回到该文章)

Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.

被以下文章引用:

  • 标题: 基于改进词向量模型的深度学习文本主题分类Deep Learning on Improved Word Embedding Model for Topic Classification

    作者: 周盈盈, 范磊

    关键字: 主题分类, 深度学习, 卷积神经网络, 词向量Topic Classification, Deep Learning, Convolutional Neural Network, Word Embedding

    期刊名称: 《Computer Science and Application》, Vol.6 No.11, 2016-11-09

    摘要: 主题分类在内容检索和信息筛选中应用广泛,其核心问题可分为两部分: 文本表示和分类模型。近年来,基于分布式词向量对文本进行表示,使用卷积神经网络作为分类器的文本主题分类方法取得了较好的分类效果。本文研究了不同词向量对卷积神经网络分类效果的影响,提出针对中文语料的topic2vec词向量模型。本文利用该模型,对具有代表性的互联网内容生成社区“知乎”进行了实验与分析。实验结果表明,利用topic2vec词向量的卷积神经网络,在长内容文本和短标题文本的分类问题中分别取得了98.06%,93.27%的准确率,较已知词向量模型均有显著提高。 Topic classification has wide applications in content searching and information filtering. It can be divided into two core parts: text embedding and classification modeling. In recent years, methods have brought out significant results using distributed word embedding as input and convolutional neural network (CNN) as classifiers. This paper discusses the impact of different word embedding for CNN classifiers, proposes topic2vec, a new word embedding specifically suitable for Chinese corpora, and conducts an experiment on Zhihu, a representative content-oriented internet com-munity. The experiment turns out that CNN with topic2vec gains an accuracy of 98.06% for long content texts, 93.27% for short title texts and an improvement comparing with other word em-bedding models.

在线客服:
对外合作:
联系方式:400-6379-560
投诉建议:feedback@hanspub.org
客服号

人工客服,优惠资讯,稿件咨询
公众号

科技前沿与学术知识分享