一种数据融合的乳腺癌分类模型
A Data Fusion Model of Breast Cancer Classify Cation
DOI: 10.12677/CSA.2019.912255, PDF,   
作者: 刘 静, 刘士亚, 张 君, 张志飞*:佛山科学技术学院自动化学院,广东 佛山;陈 旭:广东立胜综合能源服务有限公司,广东 佛山
关键词: 欠稳定适应性差权值弱分类器Understability Poor Adaptability Weight Weak Classifer
摘要: 针对乳腺癌智能诊断中的分类器欠稳定,样本分布适应性差等问题。本文提出一种基于Adaboost集成BP、RBF及Naïve Bayess三网的分类器构建算法。首先,采用三种不同的分类算法训练出不同的弱分类器;然后,通过权重在分配策略,增加患病样本被错分健康样本的权重,减小健康样本被错分的患病样本的权重;最后,通过调整后的权重重组弱分类器,达到构成一种强分类器。利用UCI (University of California, Irvine)数据库中的威斯康星乳腺癌数据进行算法对比验证,实验结果表明:本文所提出分类模型优于单一算法。
Abstract: In the intelligent diagnosis of breast cancer, the classifier is not stable and the sample distribution adaptability is poor. This paper proposes a classifier construction algorithm based on AdaBoost ensemble BP, RBF and Naïve Bayes. First, three different classification algorithms are used to train different weak classifiers. Then, by means of weight redistribution strategy, the weight of the diseased samples in which are misclassified is increased and reduces the weight of healthy samples misclassified to diseased samples. Finally, a strong classifier is constructed by reorganizing the weak classifier with the adjusted weights. The comparison and verification of the algorithm based on the Wisconsin breast cancer data in UCI database show that the proposed classification model is superior to the single algorithm.
文章引用:刘静, 陈旭, 刘士亚, 张君, 张志飞. 一种数据融合的乳腺癌分类模型[J]. 计算机科学与应用, 2019, 9(12): 2293-2302. https://doi.org/10.12677/CSA.2019.912255

参考文献

[1] Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R.L., Torre, L.A. and Jemal, A. (2018) Global Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for Clinicians, 68, 394-424. [Google Scholar] [CrossRef] [PubMed]
[2] Sizilio, Glaucia R.M.A., Leite, Cicilia R.M., Guerreiro, Ana M.G. and Doria Neto, Adriao D. (2012) Fuzzy Method for Prediagnosis of Breast Cancer from the Fine Needle Aspirate Analysis. BioMedical Engineering onLine, 11, Article No. 83. https://biomedical-engineering-online.biomedcentral.com/articles/10.1186/1475-925X-11-83 [Google Scholar] [CrossRef
[3] Abdar, M., Zomorodi-Moghadam, M., et al. (2018) A New Nested Ensemble Technique for Automated Diagnosis of Breast Cancer. Pattern Recognition Letters. (In Press) [Google Scholar] [CrossRef
[4] Hassoon, M., Kouhi, M.S., et al. (2017) Rule Optimization of Boosted C5.0 Classification Using Genetic Algorithm for Liver Disease Prediction. 2017 International Conference on Computer and Applications (ICCA), Doha, 6-7 September 2017, 299-305. [Google Scholar] [CrossRef
[5] 刘佳星, 张宏烈, 刘艳菊, 张惠玉, 刘彦忠. 基于改进随机森林的肝硬化诊断预测研究[J]. 计算机科学与应用, 2019, 9(10): 1928-1938.
[6] 岳千. 基于数据挖掘技术对心脏病诊断的研究[D]: [博士学位论文]. 西安: 陕西科技大学, 2018.
[7] McWilliam, A., Faivre-Finn, C., et al. (2016) Data Mining Identifies the Base of the Heart as a Dose-Sensitive Region Affecting Survival in Lung Cancer Pa-tients. International Journal of Radiation Oncology, Biology, Physics, 96, S48-S49. [Google Scholar] [CrossRef
[8] 郭海湘, 黄媛玥, 顾明赟, 潘雯雯. 基于自适应多分类器系统的甲状腺疾病诊断方法研究[J]. 系统工程理论与实践, 2018, 38(8): 2123-2134. http://www.sysengi.com/CN/10.12011/1000-6788(2018)08-2123-12
[9] 杨云, 董雪, 齐勇. BP算法与C4.5算法在乳腺癌诊断中的比较分析[J]. 陕西科技大学学报(自然科学版), 2015, 33(3): 163-166+172.
[10] 吴辰文, 齐晨虹, 高生鹏. 基于特征选择和数据分类的乳腺癌数据的评估分析[J]. 宁夏大学学报(自然科学版), 2018, 39(2): 155-159.
[11] 张剑飞, 崔文升, 刘明, 杜晓昕. 基于神经网络的乳腺癌早期辅助诊断分析[J]. 高师理科学刊, 2019, 39(5): 21-25+29.
[12] Aličković, E. and Subasi, A. (2017) Breast Cancer Diagnosis Using GA Feature Selection and Rotation Forest. Neural Computing and Applications, 28, 753-763. https://link.springer.com/article/10.1007%2Fs00521-015-2103-9 [Google Scholar] [CrossRef
[13] 张涛, 郝晓玲, 张玥杰, 张明辉. 基于BP-AsymBoost的医疗诊断模型[J]. 系统工程理论与实践, 2017, 37(6): 1654-1664. http://www.sysengi.com/CN/10.12011/1000-6788(2017)06-1654-11
[14] Freund, Y. and Schapire, R.E. (1997) A De-cision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55, 119-139. [Google Scholar] [CrossRef