基于电影内容的K-Means聚类分析
The Movie Content of K-Means Clustering Analysis
摘要: 随着生活水平的日益提高,人们的精神生活越来越丰富多彩。电影作为人们追求精神文化和文化创新的一部分,成为关注的焦点。在快节奏的社会环境下,能够在较短的时间内,选择喜欢的电影,无疑是最好的情况。为提高人们搜索和选择电影的质量,方式之一是对已有的电影按照主题进行分类。对文本按照主题分类的方式,存在有监督和无监督学习两种方式。有监督的学习,需要人工标注,十分耗时耗力。无监督学习,可以主动根据电影内容进行划分类别,不仅省时,而且降低了人工标注带来的经济消费。因此,本文从电影内容角度出发,提出使用K-Means聚类方法,对电影进行无监督分类;最后,可视化分类结果,每一类别下,电影有共同的主题。
Abstract: With the improvement of living standards, people’s spiritual life is becoming more and more colorful. As a part of people’s pursuit of spiritual culture and cultural innovation, film has become the focus of attention. In a fast-paced social environment, being able to choose your favorite movie in a short period of time is undoubtedly the best case. To improve the quality of people searching and selecting movies, one way is to categorize existing movies by theme. There are two ways to classify texts according to topics: supervised and unsupervised learning. Supervised learning requires manual labeling, which is very time-consuming and labor-intensive. Unsupervised learning can actively classify categories based on movie content, which not only saves time, but also reduces the economic consumption caused by manual labeling. Therefore, from the perspective of movie content, this paper proposes to use K-Means clustering method to the unsupervised classification of movies. Finally, the classification results are visualized. In each category, movies have a common theme.
文章引用:袁丽娟. 基于电影内容的K-Means聚类分析[J]. 统计学与应用, 2020, 9(2): 265-276. https://doi.org/10.12677/SA.2020.92029

参考文献

[1] 胡松涛. Python网络爬虫实战[M]. 北京: 清华大学出版社, 2017, 83-84.
[2] 韦伟. 精通Python网络爬虫[M]. 北京: 机械工业出版社, 2017, 52-61.
[3] 百度百科. 中文分词[EB/OL]
https://baike.baidu.com/item/%E4%B8%AD%E6%96%87%E5%88%86%E8%AF%8D/371496?fr=aladdin
[4] 中文分词与Jieba分词原理[EB/OL].
http://blog.csdn.net/john_xyz/article/details/54645527, 2017-01-21.
[5] [Python]基于K-Means和TF-IDF的文本聚类代码简单实现[EB/OL].
http://blog.csdn.net/eastmount/article/details/50473675, 2016-01-08.
[6] K-Means聚类过程的动态可视化[EB/OL].
http://blog.csdn.net/happyyear1/article/details/50973675, 2016-03-24.