玻璃文物特征分析及风化预测模型——基于机器学习与统计分析
Characteristic Analysis and Weathering Prediction Model of Glass Cultural Relics—Based on Machine Learning and Statistical Analysis
摘要: 本文基于2022年高教杯C题,首先将数据集行one-hot编码,对化学成分数据进行填补缺失值,并使用Person相关系数对数据相关性进行分析。后基于奇异值分解的PCA算法对数据降维。通过随机森林、支持向量机、Xgboost、Logistic回归对降维后的数据进行分类并求得决策边界,基于软分类器预测出了各个文物的风化程度,得到在文物信息数据的验证集分类准确度达到了86.7%,化学成分数据达到了94.1%。之后利用召回率、f1值、精度、ROC曲线等评价指标对模型进行了评价与选择。最终得到了各特征的相关性以及各文物的风化概率,同时得出了Xgboost在该数据集中预测的优越性。
Abstract:
Based on the 2022 Higher Education Cup A question, this paper first codes the data set line one-hot, fills in the missing values for the chemical composition data, and analyzes the data correlation using the Person correlation coefficient. After the PCA algorithm based on singular value decomposition reduces the dimensionality of the data. Through random forest, support vector machine, Xgboost, logistic regression to classify the dimensionality reduction data and find the decision boundary, based on the soft classifier, the degree of weathering of each cultural relics is predicted, and the classification accuracy of the verification set of the cultural relics information data reaches 86.7%, and the chemical composition data reaches 94.1%. After that, the model was evaluated and selected using evaluation indicators such as recall rate, f1 value, accuracy, and ROC curve. Finally, the correlation of each feature and the weathering probability of each artifact are obtained, and the superiority of Xgboost’s prediction in this data set is obtained.
参考文献
|
[1]
|
史美光, 何欧里, 吴宗道, 等. 一批中国古代铅玻璃的研究[J]. 硅酸盐学报, 1986(4): 17-23.
|
|
[2]
|
Chen, X., Qi, X.B. and Xu, Z.Y. (2020) Determination of Weathered Degree and Mechanical Properties of Stone Relics with Ultrasonic CT: A Case Study of an Ancient Stone Bridge in China. Journal of Cultural Heritage, 42, 131-138. [Google Scholar] [CrossRef]
|
|
[3]
|
姜中宏, 张勤远. 用铅同位素特征研究中国古代铅(钡)玻璃[J]. 硅酸盐学报, 1998(1): 112-116.
|
|
[4]
|
赵凤燕, 陈斌, 柴怡, 董俊卿, 李青会. 西安出土若干玻璃器的pXRF分析及相关问题探讨[J]. 考古与文物, 2015(4): 111-119.
|
|
[5]
|
李鹏艳, 谢承利, 陆继东, 林兆祥, 李捷, 杨立飞. 激光诱导击穿光谱法分析玻璃成分的实验研究[J]. 应用激光, 2009, 29(1): 21-25.
|
|
[6]
|
数据预处理之onehot编码[EB/OL]. https://www.jianshu.com/p/38f9f426e246, 2022-09-15.
|
|
[7]
|
彭海. 皮尔逊相关系数应用于医学信号相关度测量[J]. 电子世界, 2017(7): 163.
|
|
[8]
|
Ryo, N., Masaaki, O., Michiya, M., Kunimasa, Y. and Hideyuki, H. (2022) Investigation on Application of Singular Value Decomposition Filter in Element Domain for Extraction of Ultrasonic Echoes from Blood Cells in Jugular Veins. Japanese Journal of Applied Physics, 61, SG1011. [Google Scholar] [CrossRef]
|
|
[9]
|
朱华锋. Logistic模型的参数估计及其实证研究分析[J]. 科技信息, 2011(1): 169-170.
|
|
[10]
|
Marín, A., Martínez-Merino, L.I., Puerto, J. and Rodríguez-Chía, A.M. (2022) The Soft-Margin Support Vector Machine with Ordered Weighted Average. Knowledge-Based Systems, 237, Article ID: 107705. [Google Scholar] [CrossRef]
|
|
[11]
|
Chen, T.Q. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2016, 785-794. [Google Scholar] [CrossRef]
|
|
[12]
|
机器学习模型评价指标整理[EB/OL]. https://blog.csdn.net/weixin_43199584/article/details/105722477, 2022-09-15.
|