一种支持向量机预处理方法的研究
Research on a Preprocessing Method of Support Vector Machine
摘要: 支持向量机(Support Vector Machine, SVM)在处理大规模数据集时,随着样本维度增高,样本数量增多会出现训练时间显著增多的问题。为了解决该问题,文章提出了一种基于主成分分析(Principal Component Analysis, PCA)和K边界近邻法(K Nearest Bound Neighbor, KNBN)的SVM预处理方法;先用PCA对训练数据降维消除训练数据中的冗余信息,然后利用KNBN预选取训练数据中的支持向量来减少训练数据量。数值实验结果表明,与PCA-SVM、KNBN-SVM和无数据预处理的SVM方法相比,采用本文提出的SVM预处理方法既保持了良好的分类预测精度,又缩短了大量训练时间。
Abstract: When the large-scale data set with higher dimension and larger number of samples is processed by the Support Vector Machine (SVM), the training time will increase significantly. In order to solve this problem, based on Principal Component Analysis (PCA) and K Nearest Bound Neighbor (KNBN), a preprocessing method of SVM is proposed. Firstly, PCA is used to reduce the dimension of the training data to eliminate the redundant information in the training data, and then KNBN is used to preselect the support vectors in the training data to reduce the amount of training data. The numerical experiment results show that SVM preprocessing method proposed in this paper, compared with PCA-SVM, KNBN-SVM and SVM without data preprocessing, can not only keep good classification prediction accuracy, but also save a lot of training time.
文章引用:韩成志, 李梦婷, 郑恩涛, 马国春. 一种支持向量机预处理方法的研究[J]. 应用数学进展, 2020, 9(10): 1757-1765. https://doi.org/10.12677/AAM.2020.910203

参考文献

[1] Vapnik, V. (1998) Statistical Learning Theory. Vol. 3, Chapter 10-11, Wiley, New York, 401-492.
[2] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 121-139, 298-300.
[3] 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012: 第7章, 95-135.
[4] Qin, J. and He, Z.S. (2005) A SVM Face Recognition Method Based on Gabor-Featured Key Points. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, 8, 5144-5149. [Google Scholar] [CrossRef
[5] Sun, A., Lim, E.P. and Ng, W.K. (2002) Web Classification Using Support Vector Machine. Proceedings of the 4th International Workshop on Web Information and Data Management, McLean, Virginia, November 2002, 96-99. [Google Scholar] [CrossRef
[6] 余金澳, 吴彦鸿. 一种面向方位敏感性的PCA-SVM分类识别方法[J]. 无线电工程, 2018, 48(2): 83-87.
[7] Aicha, A.B. (2018) Noninvasive Detection of Potentially Precancerous Lesions of Vocal Fold Based on Glottal Wave Signal and SVM Approaches. Procedia Computer Science, 126, 586-595. [Google Scholar] [CrossRef
[8] 汪雯琦, 高广阔. 基于PCA和SVM分类的跨年龄人脸识别[J]. 计算机时代, 2019(7): 1-4+8.
[9] Wang, C., Hu, Z.L., Pang, Q. and Hua, L. (2019) Research on the Classification Algorithm and Operation Parameters Optimization of the System for Separating Non-Ferrous Metals from End-of-Life Vehicles Based on Machine Vision. Waste Management, 100, 10-17. [Google Scholar] [CrossRef] [PubMed]
[10] Zhang, L., et al. (2008) Support Vectors Pre-Extracting for Support Vector Machine Based on K Nearest Neighbour Method. Proceedings of IEEE International Conference on Information and Automation, Zhangjiajie, 20-23 June 2008, 1353-1358.
[11] 徐红敏, 王若鹏, 张怀念. 支持向量机的快速分类算法[J]. 北京石油化工学院学报, 2009, 17(4): 55-58.
[12] 胡志军, 王鸿斌, 张惠斌. 基于距离排序的快速支持向量机分类算法[J]. 计算机应用与软件, 2013, 30(4): 85-87+100.
[13] 李庆, 胡捍英. 支持向量预选取的K边界近邻法[J]. 电路与系统学报, 2013, 18(2): 91-96.
[14] 万静, 吴凡, 何云斌, 李松. 新的降维标准下的高维数据聚类算法[J]. 计算机科学与探索, 2020, 14(1): 96-107.
[15] 陆微微, 刘晶. 一种提高K-近邻算法效率的新算法[J]. 计算机工程与应用, 2008, 44(4): 163-165+178.
[16] Tsang, I.W., Kwok, J.T. and Cheung, P.-M. (2005) Core Vector Machines: Fast SVM Training on Very Large Data Sets. The Journal of Machine Learning Research, 6, 363-392.
[17] Tomczak, S. Polish Companies Bankruptcy Data. Data Set. http://archive.ics.uci.edu/ml/datasets/Polish+companies+bankruptcy+data, 2020-09-25.