基于潜在空间折叠与纵横交叉注意力的轴承故障诊断模型:面向小样本与强噪声环境的研究
A Bearing Fault Diagnosis Model Based on Latent Space Folding and Criss-Cross Attention: Research on Small-Sample and Heavy-Noise Environments
摘要: 本文在FaultFormer的预训练范式的基础上,提出了一种融合结构化特征重塑与深度特征交互的改进型故障诊断框架——Hybrid CCNet。该模型保留了Transformer强大的序列建模能力的同时,在特征提取上进行针对性的架构改进与融合:(1) 周期启发的潜在空间折叠:参考计算机视觉中的思想,利用旋转机械信号的准周期性特征,构建映射函数将一维时域信号重塑为二维潜在特征网格,将时间维度的周期性冲击转化为空间维度的纹理特征;(2) 强下采样卷积特征分词器:设计了一个三层级联卷积前端替代了简单的线性投影,利用大卷积核进行初步去噪和下采样,结合批归一化与最大池化提取局部的特征;(3) 纵横交叉注意力特征增强:在Transformer编码器输入之前引入稀疏注意力机制,通过计算行与列的亲和度矩阵,聚合二维空间中的跨周期故障特征。实验表明,Hybrid CCNet展现出了优秀的鲁棒性与性能。在极端小样本场景的情况下,模型准确率达到99%,优于CNN和FaultFormer架构。在信噪比低至−4 dB的强噪声环境中,模型保持了95%以上的诊断精度。
Abstract: Based on the pre-training paradigm of FaultFormer, this paper proposes an improved fault diagnosis framework—Hybrid CCNet, which integrates structured feature reshaping and deep feature interaction. This model retains the powerful sequence modeling ability of Transformer while making targeted architectural improvements and integration in feature extraction: (1) Periodic-inspired latent space folding: Inspired by the ideas in computer vision, using the quasi-periodic features of rotating mechanical signals, a mapping function is constructed to reshape one-dimensional time-domain signals into two-dimensional latent feature grids, converting the periodic impacts in the time dimension into texture features in the spatial dimension; (2) Strong downsampling convolution feature tokenizer: A three-level cascaded convolution front-end is designed to replace the simple linear projection. Using large convolution kernels for preliminary denoising and downsampling, combined with batch normalization and max pooling to extract local features; (3) Criss-cross attention feature enhancement: A sparse attention mechanism is introduced before the input of the Transformer encoder. By calculating the affinity matrix of rows and columns, cross-periodic fault features in the two-dimensional space are aggregated. Experimental results show that Hybrid CCNet exhibits excellent robustness and performance. In extremely small-sample scenarios, the model accuracy reaches 99%, superior to that of CNN and FaultFormer architectures. In a strong noise environment with a signal-to-noise ratio as low as −4 dB, the model maintains a diagnostic accuracy of over 95%.
文章引用:齐嘉泰, 吴宗昆, 朱诚昕, 曹正, 刘小刚. 基于潜在空间折叠与纵横交叉注意力的轴承故障诊断模型:面向小样本与强噪声环境的研究[J]. 计算机科学与应用, 2026, 16(4): 101-113. https://doi.org/10.12677/csa.2026.164113

参考文献

[1] Zhang, W., Li, C., Peng, G., Chen, Y. and Zhang, Z. (2018) A Deep Convolutional Neural Network with New Training Methods for Bearing Fault Diagnosis under Noisy Environment and Different Working Load. Mechanical Systems and Signal Processing, 100, 439-453. [Google Scholar] [CrossRef
[2] Jia, F., Lei, Y., Lu, N. and Xing, S. (2018) Deep Normalized Convolutional Neural Network for Imbalanced Fault Classification of Machinery and Its Understanding via Visualization. Mechanical Systems and Signal Processing, 110, 349-367. [Google Scholar] [CrossRef
[3] Tang, J., Zheng, G., Wei, C., Huang, W. and Ding, X. (2022) Signal-Transformer: A Robust and Interpretable Method for Rotating Machinery Intelligent Fault Diagnosis under Variable Operating Conditions. IEEE Transactions on Instrumentation and Measurement, 71, 1-11. [Google Scholar] [CrossRef
[4] Zerveas, G., Jayaraman, S., Patel, D., Bhamidipaty, A. and Eickhoff, C. (2021) A Transformer-Based Framework for Multivariate Time Series Representation Learning. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2114-2124. [Google Scholar] [CrossRef
[5] Zhou, A.Y. and Barati Farimani, A. (2024) Faultformer: Pretraining Transformers for Adaptable Bearing Fault Classification. IEEE Access, 12, 70719-70728. [Google Scholar] [CrossRef
[6] He, K., Chen, X., Xie, S., Li, Y., Dollar, P. and Girshick, R. (2022) Masked Autoencoders Are Scalable Vision Learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 16000-16009. [Google Scholar] [CrossRef
[7] Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y. and Liu, W. (2019) CCNet: Criss-Cross Attention for Semantic Segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 603-612. [Google Scholar] [CrossRef