文章引用说明 更多>> (返回到该文章)

刘斌, 黄铁军, 程军等. 一种新的基于统计的自动文本分类方法[J]. 中文信息学报, 2002, 16(6): 18-24.

被以下文章引用:

  • 标题: 基于标题的中文新闻分类研究Research of Chinese News Classification Based on Titles

    作者: 王海涛, 赵艳琼, 岳磅

    关键字: 文本分类, 标题分类, 新闻分类, 语义相似度Text Classification; Title Classification; News Classification; Semantic Similarity

    期刊名称: 《Hans Journal of Data Mining》, Vol.3 No.3, 2013-07-02

    摘要: 如何快捷、准确、全面地检索互联网信息是互联网时代的重要问题。网络新闻比传统纸质媒体新闻速度更快、内容更丰富、形式更灵活生动,正逐渐取代传统新闻媒体成为很多人获取新闻信息的主要途径。然而,面对快速更新的大量新闻信息,传统的手工分类方式无法满足用户的需求。新闻的主要内容一般都是以文本的方式呈现,因此,利用文本自动分类技术对网络新闻进行自动分类是解决手工新闻分类问题的一个有效途径。由于网络新闻信息形式多样,很多新闻内容完全是由图片或者视频组成,不包含文本内容。本文提出通过新闻标题对网络新闻进行分类的方法,比通过内容进行分类的方法分类速度更快,并且有更强的适应性,可对无文本内容的新闻(如图片新闻、标题新闻等)进行分类。本文创建了基于标题的文本分类模型;从网络上获取新闻语料,验证模型的工作情况;并通过与基于内容的文本分类方法比较,验证基于标题的文本分类模型的优劣。本文构建了基于标题的两步分类系统,所提出的类别唯一特征,对于可分样本可以实现高分类准确率。 Retrieving online information efficiently becomes a crucial issue in nowadays online experience. Compared with traditional news in paper form, online news are faster, more convenient and more flexible. It is a trend that online news are replacing their traditional counterpart and becoming the most common way for people to obtain daily information. However, the volume of frequent updated news becomes so large that the traditional manual news classification cannot meet the needs of online users. One of the solutions for this will be applying automatic text classification technologies to classify online news. Many IT companies are developing automatic news classification systems. There are different forms of network news. Some of the news are composed mostly by graphics or videos instead of text and therefore not able to be coped with by classic text classification. A new approach of news classifier based on news titles is proposed to dealing with such news. In this paper, the title based classification model was created. The model was evaluated by a built corpus and compared with contents based classification. A two-phase news classification system is constructed and category key feature is proposed.

在线客服:
对外合作:
联系方式:400-6379-560
投诉建议:feedback@hanspub.org
客服号

人工客服,优惠资讯,稿件咨询
公众号

科技前沿与学术知识分享