多发展指标下的各省人口因子与聚类分析
Factor Analysis and Cluster Analysis of Province Population under Multiple Development Indexes
摘要:
随着环境问题日益受到关注,人们开始对“每个地区的人口规模与结构变化”进行思索,本文利用因子分析和聚类分析的方法对各个省及直辖市的人口进行一个综合排名分类。本文采用了因子分析的方法来构建综合得分模型,具体包括数据标准化处理、因子提取、命名以及构建综合模型。本文选取了31个省(自治区、直辖市)数据进行因子分析,通过因子提取结合得到的累计贡献率确定出5个因子。随后通过计算得出了因子得分公式(提取出来的因子F和各指标X之间的关系)和综合模型得分公式(综合得分Y和各指标X之间的关系)。然后将数据代入构建好的综合得分模型,得出每个省及直辖市的综合得分,进行综合得分排名。再利用K-means聚类的方法对各个城市的综合得分进行一维聚类,得到最终结果将31个省(自治区、直辖市)分为了六类。
Abstract:
With the increasing concern of environmental issues, people begin to think about the “population size and structure change in each region”. This paper uses factor analysis and cluster analysis to classify the population of each province and municipality directly under the Central Government. In this paper, the factor analysis method is used to build a comprehensive scoring model, including data standardization, factor extraction, naming and building a comprehensive model. This paper will select 31 provinces (autonomous regions and municipalities) under the Central Government data for factor analysis, through factor extraction combined with the cumulative contribution rate to determine five factors. Then the factor score formula (the relationship between the extracted factor F and each index X) and the comprehensive model score formula (the relationship between the comprehensive score Y and each index X) are obtained by calculation. Then the data are substituted into the constructed comprehensive scoring model to get the comprehensive scoring of each province and municipality directly under the Central Government and rank the comprehensive scoring. Then, the K-means clustering method is used to cluster the comprehensive scores of each city in one dimension, and the final result is that 31 provinces (autonomous regions and municipalities) under the Central Government are divided into six categories.