基于无监督特征选择的半导体质量检测方法
Semiconductor Quality Inspection Method Based on Unsupervised Feature Selection
摘要: 半导体质量检测数据存在高维冗余、类别不平衡及标签获取成本高的特性,导致传统有监督检测方法工业适用性受限,而传统无监督特征选择方法往往忽略全局结构或缺乏冗余量化机制,难以适配高维强非线性耦合的数据需求。基于此,本文提出一种面向半导体高维制造数据的无监督特征选择方法FSSC-DCOR (Feature Selection by Spectral Clustering and Distance CORrelation coefficient)。该方法结合谱聚类、距离相关系数与贪心策略三种技术,以特征为聚类对象,通过谱聚类挖掘特征内在关联结构并筛选高信息量候选特征,利用距离相关系数矩阵量化非线性冗余,最终通过贪心策略保留低冗余、高区分度的核心特征子集。该方法无需依赖标注标签即可完成高维数据有效降维,适配半导体场景标签稀缺的现实需求。实验结果表明,在SECOM半导体数据集上,所提方法的性能度量指标均优于传统特征选择方法。
Abstract: Semiconductor quality inspection data exhibits characteristics of high-dimensional redundancy, class imbalance, and high cost of label acquisition, resulting in limited industrial applicability of traditional supervised detection methods. In contrast, conventional unsupervised feature selection methods either ignore the global structure or lack redundancy quantification, making it difficult to meet the requirements of high-dimensional and strongly nonlinearly coupled data. To address this issue, an unsupervised feature selection method named FSSC-DCOR (Feature Selection by Spectral Clustering and Distance CORrelation coefficient) is proposed for high-dimensional semiconductor manufacturing data. This method combines three techniques: spectral clustering, distance correlation coefficient, and greedy strategy. Taking features as clustering objects, it mines the intrinsic correlation structure of features through spectral clustering to select high-information candidate features, quantifies nonlinear redundancy using a distance correlation coefficient matrix, and finally retains a core feature subset with low redundancy and high discriminability via the greedy strategy. Without relying on labeled data, the method can achieve effective dimensionality reduction of high-dimensional data, adapting to the practical demand of label scarcity in semiconductor scenarios. Experimental results demonstrate that on the SECOM semiconductor dataset, the performance metrics of the proposed method are all superior to those of traditional feature selection methods.
文章引用:余青青. 基于无监督特征选择的半导体质量检测方法[J]. 统计学与应用, 2026, 15(2): 31-48. https://doi.org/10.12677/sa.2026.152032

参考文献

[1] Nuhu, A.A., Zeeshan, Q., Safaei, B. and Shahzad, M.A. (2022) Machine Learning-Based Techniques for Fault Diagnosis in the Semiconductor Manufacturing Process: A Comparative Study. The Journal of Supercomputing, 79, 2031-2081. [Google Scholar] [CrossRef
[2] 程云飞, 周丽芳, 赵波, 等. 特征提取及数据扩充的GA-LightGBM半导体质量检测方法[J]. 重庆邮电大学学报(自然科学版), 2024, 36(2): 351-356.
[3] 柳嘉昊. 基于KMUS-RF算法的复杂产品关键质量特性识别研究[J]. 中小企业管理与科技(下旬刊), 2021(10): 134-137.
[4] Gomez-Sirvent, J.L., de la Rosa, F.L., Sanchez-Reolid, R., Fernandez-Caballero, A. and Morales, R. (2022) Optimal Feature Selection for Defect Classification in Semiconductor Wafers. IEEE Transactions on Semiconductor Manufacturing, 35, 324-331. [Google Scholar] [CrossRef
[5] He, Q.P. and Wang, J. (2007) Fault Detection Using the K-Nearest Neighbor Rule for Semiconductor Manufacturing Processes. IEEE Transactions on Semiconductor Manufacturing, 20, 345-354. [Google Scholar] [CrossRef
[6] Baek, M. and Kim, S.B. (2023) Failure Detection and Primary Cause Identification of Multivariate Time Series Data in Semiconductor Equipment. IEEE Access, 11, 54363-54372. [Google Scholar] [CrossRef
[7] Qian, X., Sun, T., Wang, B. and Zhang, Y. (2023) A Weighted KNN Fault Detection Based on Multistep Index and Dynamic Neighborhood Scale under Complex Working Conditions. IEEE Access, 11, 49183-49192. [Google Scholar] [CrossRef
[8] Kuo, T., Hong, T. and Chen, L. (2025) Sustainable Fault Detection and Process Simulation in Semiconductor Manufacturing Using Machine Learning and Life Cycle Assessment. Computers & Industrial Engineering, 210, Article ID: 111584. [Google Scholar] [CrossRef
[9] López de la Rosa, F., Gómez-Sirvent, J.L., Morales, R., Sánchez-Reolid, R. and Fernández-Caballero, A. (2023) Defect Detection and Classification on Semiconductor Wafers Using Two-Stage Geometric Transformation-Based Data Augmentation and Squeezenet Lightweight Convolutional Neural Network. Computers & Industrial Engineering, 183, Article ID: 109549. [Google Scholar] [CrossRef
[10] Jiao, S., Yang, W., Wu, C., Li, Y. and Xue, B. (2025) Mixed-Type Micro-Defect Detection in Semiconductor Wafers: A Dual-Modal Feature Real-Time Detection Approach via Optical Topography and Lightweight Classification Network. Engineering Applications of Artificial Intelligence, 160, Article ID: 111838. [Google Scholar] [CrossRef
[11] 闫伟, 何桢, 田文萌, 等. 基于IG的复杂产品关键质量特性识别[J]. 工业工程与管理, 2012, 17(1): 70-74, 83.
[12] 李岸达, 何桢, 何曙光. 基于Filter与Wrapper的复杂产品关键质量特性识别[J]. 工业工程与管理, 2014, 19(3): 53-59.
[13] Lee, D., Yang, J., Lee, C. and Kim, K. (2019) A Data-Driven Approach to Selection of Critical Process Steps in the Semiconductor Manufacturing Process Considering Missing and Imbalanced Data. Journal of Manufacturing Systems, 52, 146-156. [Google Scholar] [CrossRef
[14] 李航. 机器学习方法[M]. 北京: 清华大学出版社, 2022.
[15] von Luxburg, U. (2007) A Tutorial on Spectral Clustering. Statistics and Computing, 17, 395-416. [Google Scholar] [CrossRef
[16] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[17] Székely, G.J., Rizzo, M.L. and Bakirov, N.K. (2007) Measuring and Testing Dependence by Correlation of Distances. The Annals of Statistics, 35, 2769-2794. [Google Scholar] [CrossRef
[18] Li, R., Zhong, W. and Zhu, L. (2012) Feature Screening via Distance Correlation Learning. Journal of the American Statistical Association, 107, 1129-1139. [Google Scholar] [CrossRef] [PubMed]
[19] 谢娟英, 丁丽娟, 王明钊. 基于谱聚类的无监督特征选择算法[J]. 软件学报, 2020, 31(4): 1009-1024.
[20] Murphy, P. and Aha, D. (2008) UCIML Repository.
https://archive.ics.uci.edu/ml/datasets/SECOM