#### 期刊菜单

Clustering of Power Consumption Patterns Based on PK-Means Algorithm

Abstract: Clustering of power consumption patterns is an important basis for power grid demand side management, load forecasting, and power system planning, and is of great significance to the analysis, operation, and planning of power systems. Aiming at the problem that the traditional K-Means algorithm does not effectively use time series features when clustering electricity consumption patterns, an improved time series clustering algorithm PK-Means based on the K-Means algorithm is proposed, and based on the SSE evaluation index an improvement was made, and an evaluation index cumulative similarity (CS) for time series clustering algorithm was proposed. Through the introduction of Pearson correlation coefficient, PK-Means in the scenario of electricity consumption pattern clustering compared with the traditional K-Means achieves better clustering results.

1. 引言

2. PK-Means算法

2.1. K-Means算法原理

K-Means是数据挖掘领域最常用的无监督聚类算法，主要思想是基于给定的样本集，按照样本之间的距离大小，将样本集划分为K个类，目标是让类内的点尽量紧密的连在一起，而让类间的距离尽量的大。假设K个类为 $\left({C}_{1},{C}_{2},\cdots ,{C}_{k}\right)$ ，优化目标是最小化平方误差和(SSE, Sum of Squares due to Error)，其数据表达式如下：

$\text{SSE}=\underset{i=1}{\stackrel{k}{\sum }}\underset{x\in {C}_{i}}{\sum }{‖x-{\mu }_{i}‖}_{2}^{2}$ (1)

K-Means用于聚类分析时的一般步骤如下：

1) 设置要聚类的数量为K，程序最大迭代次数为N。

2) 从给定的数据集中随机选择k个样本作为初始的k个质心向量： $\left\{{\mu }_{1},{\mu }_{2},\cdots ,{\mu }_{k}\right\}$

3) 计算其他样本到每个质心的欧式距离： ${d}_{ij}={‖{x}_{i}-{\mu }_{j}‖}_{2}^{2}$ ，并将当前样本划分到距离最小的类中。

4) 更新质心数据

${\mu }_{j}=\frac{1}{|{C}_{i}|}\underset{x\in {C}_{i}}{\sum }\text{ }\text{ }x$ (2)

5) 重复上述3)、4)过程直到质心不在变化或者程序达到最大迭代次数。

2.2. 皮尔逊相关系数

$\begin{array}{c}{\rho }_{X,Y}=\frac{\text{cov}\left(X,Y\right)}{{\sigma }_{X}{\sigma }_{Y}}\\ \begin{array}{c}=\frac{E\left(XY\right)-E\left(X\right)E\left(Y\right)}{\sqrt{E\left({X}^{2}\right)-{\left(E\left(X\right)\right)}^{2}}\sqrt{E\left({Y}^{2}\right)-{\left(E\left(Y\right)\right)}^{2}}}\end{array}\\ =\frac{n{\sum }_{i=1}^{n}\text{ }\text{ }{x}_{i}{y}_{i}-{\sum }_{i=1}^{n}\text{ }\text{ }{x}_{i}{\sum }_{i=1}^{n}\text{ }\text{ }{y}_{i}}{\sqrt{n{\sum }_{i=1}^{n}\text{ }\text{ }{x}_{i}^{2}-{\left({\sum }_{i=1}^{n}\text{ }\text{ }{x}_{i}\right)}^{2}}\sqrt{n{\sum }_{i=1}^{n}\text{ }\text{ }{y}_{i}^{2}-{\left({\sum }_{i=1}^{n}\text{ }\text{ }{y}_{i}\right)}^{2}}}\end{array}$ (3)

2.3. PK-Means算法

Figure 1. PK-Means algorithm flow chart

3. 实验分析

3.1. 数据集

$\begin{array}{ccccc}{x}_{1}^{1}& {x}_{2}^{1}& \cdots & {x}_{11}^{1}& {x}_{12}^{1}\\ {x}_{1}^{2}& {x}_{2}^{2}& \cdots & {x}_{11}^{2}& {x}_{12}^{2}\\ {x}_{1}^{3}& {x}_{2}^{3}& \cdots & {x}_{11}^{3}& {x}_{12}^{3}\end{array}$ (4)

$\begin{array}{c}{y}_{i}=\frac{{x}_{i}-\mathrm{min}\left(X\right)}{\mathrm{max}\left(X\right)-\mathrm{min}\left(X\right)}\end{array}$ (5)

3.2. 评价标准

$\begin{array}{c}\text{CS}=\underset{i=1}{\overset{k}{\sum }}\underset{x\in {C}_{i}}{\sum }{\rho }_{x,{\mu }_{i}}\end{array}$ (6)

3.3. 实验结果分析

Table 1. CS table under different K values

Figure 2. CS curve

Table 2. CS table under different iterations

Figure 3. Visualization of clustering results

4. 结论

 [1] 冉冉, 陈硕, 刘颖, 李钊. 基于聚类分析的用电模式判别研究[J]. 电力大数据, 2019, 22(4): 43-49. [2] 钱科军, 沈杰, 刘乙, 徐涛, 张政, 宋杰. 基于负荷聚类的居民需求响应积分精准激励机制[J]. 智慧电力, 2019, 47(7): 29-35. [3] 刘俊, 罗凡, 刘人境, 徐辉, 严杰. 大数据背景下电力需求侧管理的应用策略研究[J]. 电力需求侧管理, 2016, 18(2): 5-10. [4] 张昕, 李栋华, 程明. 基于大数据技术的错峰用电管理应用研究[J]. 现代电力, 2015, 32(3): 66-70. [5] 隋兴嘉. 基于配用电大数据的用电行业分类和用电量需求预测建模分析[D]: [硕士学位论文]. 长春: 长春工业大学, 2018. [6] 李培强, 李欣然, 陈辉华, 等. 基于模糊聚类的电力负荷特性的分类与综合[J]. 中国电机工程学报, 2005, 25(24): 73-78. [7] Zhong, C., Shao, J., Zheng, F., et al. (2018) Research on Electricity Consumption Behavior of Electric Power Users Based on Tag Technology and Clustering Algorithm. 2018 5th International Conference on Information Science and Control Engineering (ICISCE), Zhengzhou, 20-22 July 2018, 459-462. https://doi.org/10.1109/ICISCE.2018.00102 [8] 卜祥国. 基于电力大数据的家庭用电模式分析与负荷预测[D]: [硕士学位论文]. 杭州: 杭州电子科技大学, 2022. [9] Wang, Y., Yang, Z., Wang, Y., et al. (2022) Research on Customer’s Electricity Consumption Behavior Pattern. Journal of Physics: Conference Series, 2290, 012042. https://doi.org/10.1088/1742-6596/2290/1/012042 [10] 王建元, 张少锋. 基于线性判别分析和密度峰值聚类的异常用电模式检测[J]. 电力系统自动化, 2022, 46(5): 87-98. [11] 卜祥国, 纪德洋, 金锋, 冬雷, 等. 基于皮尔逊相关系数的光伏电站数据修复[J]. 中国电机工程学报, 2022, 42(4): 1514-1523.