基于电影内容的K-Means聚类分析
The Movie Content of K-Means Clustering Analysis
摘要:
随着生活水平的日益提高,人们的精神生活越来越丰富多彩。电影作为人们追求精神文化和文化创新的一部分,成为关注的焦点。在快节奏的社会环境下,能够在较短的时间内,选择喜欢的电影,无疑是最好的情况。为提高人们搜索和选择电影的质量,方式之一是对已有的电影按照主题进行分类。对文本按照主题分类的方式,存在有监督和无监督学习两种方式。有监督的学习,需要人工标注,十分耗时耗力。无监督学习,可以主动根据电影内容进行划分类别,不仅省时,而且降低了人工标注带来的经济消费。因此,本文从电影内容角度出发,提出使用K-Means聚类方法,对电影进行无监督分类;最后,可视化分类结果,每一类别下,电影有共同的主题。
Abstract:
With the improvement of living standards, people’s spiritual life is becoming more and more colorful. As a part of people’s pursuit of spiritual culture and cultural innovation, film has become the focus of attention. In a fast-paced social environment, being able to choose your favorite movie in a short period of time is undoubtedly the best case. To improve the quality of people searching and selecting movies, one way is to categorize existing movies by theme. There are two ways to classify texts according to topics: supervised and unsupervised learning. Supervised learning requires manual labeling, which is very time-consuming and labor-intensive. Unsupervised learning can actively classify categories based on movie content, which not only saves time, but also reduces the economic consumption caused by manual labeling. Therefore, from the perspective of movie content, this paper proposes to use K-Means clustering method to the unsupervised classification of movies. Finally, the classification results are visualized. In each category, movies have a common theme.