基于随机秩次k近邻规则的不平衡数据分类算法
An Ensemble Imbalanced Data Classification Algorithm Based on Random k-Rank Nearest Neighbor Rules
DOI: 10.12677/AAM.2020.95074, PDF,  被引量   
作者: 沈怡欣, 马双鸽*:太原理工大学数学学院,山西 晋中;Subhash C. Bagui:西佛罗里达大学数学与统计系,佛罗里达 彭萨科拉
关键词: 不平衡数据分类秩次k近邻集成学习重采样随机子空间法Imbalanced Data Classification k-Rank Nearest Neighbor Rule Bagging Resampling Techniques
摘要: 针对不平衡数据分类问题,为提高二分类任务中少数类样本分类准确率低的问题,本文提出一种随机秩次k近邻集成学习算法——REKRNN。该方法将秩次k近邻算法应用于Bagging集成学习框架中,同时采用混合重采样和随机子空间法平衡训练集,增加基学习器差异性。仿真实验证明,该算法在处理不平衡数据分类任务时性能良好。
Abstract: In this article, a random ensemble k-RNN algorithm called REKRNN is proposed to deal with the imbalanced data classification. The algorithm incorporates the k-rank nearest neighbor classifier into the frame of Bagging algorithm. At the same time, resampling techniques and random feature method are applied to deal with the imbalanced issue. We observe that the proposed method per-formed remarkably well on different imbalanced dataset. The random ensemble k-RNN algorithm can be considered as a promising tool for imbalanced classification.
文章引用:沈怡欣, Subhash C. Bagui, 马双鸽. 基于随机秩次k近邻规则的不平衡数据分类算法[J]. 应用数学进展, 2020, 9(5): 622-629. https://doi.org/10.12677/AAM.2020.95074

参考文献

[1] He, H. and Garcia, E.A. (2009) Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engi-neering, 21, 1263-1284. [Google Scholar] [CrossRef
[2] Zakaryazad, A. and Duman, E. (2016) A Profit-Driven Artificial Neural Network (ANN) with Applications to Fraud Detection and Direct Marketing. Neuro-computing, 175, 121-131. [Google Scholar] [CrossRef
[3] Liu, G., Yang, Y. and Li, B. (2018) Fuzzy Rule-Based Oversampling Technique for Imbalanced and Incomplete Data Learning. Knowledge-Based Systems, 158, 154-174. [Google Scholar] [CrossRef
[4] Lin, W.C., Tsai, C.F., Hu, Y.H., et al. (2017) Clustering-Based Undersampling in Class-Imbalanced Data. Information Sciences, 409-410, 17-26. [Google Scholar] [CrossRef
[5] 沈学华, 周志华, 吴建鑫, 等. Boosting和 Bagging综述[J]. 计算机工程与应用, 2000, 36(12): 31-32, 40.
[6] 张翔, 周明全, 耿国华, 等. Bagging算法在中文文本分类中的应用[J]. 计算机工程与应用, 2009, 45(5): 135-137, 179.
[7] 毛国君, 段立娟. 数据挖掘原理与算法[M]. 第3版. 北京: 清华大学出版社, 2016.
[8] Bagui, S.C., Bagui, S., Pal, K. and Pal, N.R. (2003) Breast Cancer Detection Using Rank Nearest Neighbor Classification Rules. Pattern Recognition, 36, 25-34. [Google Scholar] [CrossRef
[9] Bagui, S.C. and Vaughn, B. (1998) Statistical Classification Based on k-Rank Nearest Neighbor Rule. Statistical Decisions, 16, 181-189. [Google Scholar] [CrossRef
[10] Gul, A., Perperoglou, A., Khan, Z., et al. (2018) Ensemble of a Subset of KNN Classifiers. Advanced Data Analysis and Classification, 12, 827-840. [Google Scholar] [CrossRef] [PubMed]
[11] 李欣海. 随机森林模型在分类与回归分析中的应用[J]. 应用昆虫学报, 2013(4): 1190-1197.