标题:
基于改进词向量模型的深度学习文本主题分类Deep Learning on Improved Word Embedding Model for Topic Classification
作者:
周盈盈, 范磊
关键字:
主题分类, 深度学习, 卷积神经网络, 词向量Topic Classification, Deep Learning, Convolutional Neural Network, Word Embedding
期刊名称:
《Computer Science and Application》, Vol.6 No.11, 2016-11-09
摘要:
主题分类在内容检索和信息筛选中应用广泛,其核心问题可分为两部分: 文本表示和分类模型。近年来,基于分布式词向量对文本进行表示,使用卷积神经网络作为分类器的文本主题分类方法取得了较好的分类效果。本文研究了不同词向量对卷积神经网络分类效果的影响,提出针对中文语料的topic2vec词向量模型。本文利用该模型,对具有代表性的互联网内容生成社区“知乎”进行了实验与分析。实验结果表明,利用topic2vec词向量的卷积神经网络,在长内容文本和短标题文本的分类问题中分别取得了98.06%,93.27%的准确率,较已知词向量模型均有显著提高。
Topic classification has wide applications in content searching and information filtering. It can be divided into two core parts: text embedding and classification modeling. In recent years, methods have brought out significant results using distributed word embedding as input and convolutional neural network (CNN) as classifiers. This paper discusses the impact of different word embedding for CNN classifiers, proposes topic2vec, a new word embedding specifically suitable for Chinese corpora, and conducts an experiment on Zhihu, a representative content-oriented internet com-munity. The experiment turns out that CNN with topic2vec gains an accuracy of 98.06% for long content texts, 93.27% for short title texts and an improvement comparing with other word em-bedding models.