基于双流Mobile Vit与通道剪枝的工业过程故障诊断
Industrial Process Fault Diagnosis Based on Two Stream Mobile Vit and Channel Pruning
DOI: 10.12677/mos.2025.142166, PDF,    国家自然科学基金支持
作者: 马旺兵, 田 颖:上海理工大学光电信息与计算机工程学院,上海
关键词: 视频监控模型剪枝双流模型Mobile VitVideo Monitoring Model Prune Two Stream Model Mobile Vit
摘要: 工业过程视频监控数据的时间连续性和空间连续性,但是已有的基于视频数据的故障诊断模型有规模庞大而难以部署。为此,本研究提出了一种基于双流Mobile Vit的轻量化视频分类模型用于故障诊断,并且利用权重剪枝技术来降低模型大小。首先提取工业视频的视频帧和稠密光流分别作为工业过程的空间特征和时序特征,再使用两条轻量化主干网络Mobile Vit提取视频的空间特征和时序特征。最终在双流模型的尾部使用卷积注意力融合机制使光流特征和特征充分融合用于最终的诊断。为使模型更加轻量化,模型权重剪枝被用来降低双流Mobile Vit的参数量。通过实验表明,相比于其他故障诊断模型,本研究所提出的模型和所应用的剪枝方法在取得较高的诊断精度的同时,模型大小也远远低于其他模型。
Abstract: Industrial process video surveillance data has the temporal and spatial continuity,however the existing fault diagnosis models based on video data is too large be deployed in real actual industrial process. Therefore, this study proposes a lightweight video classification model based on Two-stream Mobile Vit for fault diagnosis, and the weights of the Two-Stream Mobile Vit model are pruned. Firstly, video frames and dense optical flows of industrial videos are extracted as spatial and temporal features of industrial processes, respectively. Then, the backbones of the proposed model Mobile Vit is used to extract deep spatial features and temporal features. Finally, the Convolutional Attention Fusion Mechanism is used at the tail of the Two-Stream model to fully fuse spatial features and temporal features for final diagnosis. In order to make the proposed model more lightweight, model weight pruning is used to reduce the number of parameters of the whole model. It is shown through experiments that compared with other fault diagnosis models, the model proposed and the pruning method applied in this study achieve higher diagnostic accuracy while the model size is much lower than other models.
文章引用:马旺兵, 田颖. 基于双流Mobile Vit与通道剪枝的工业过程故障诊断[J]. 建模与仿真, 2025, 14(2): 449-459. https://doi.org/10.12677/mos.2025.142166

参考文献

[1] Garcia-Alvarez, D., Bregon, A., Pulido, B. and Alonso-Gonzalez, C.J. (2023) Integrating PCA and Structural Model Decomposition to Improve Fault Monitoring and Diagnosis with Varying Operation Points. Engineering Applications of Artificial Intelligence, 122, Article 106145. [Google Scholar] [CrossRef
[2] Lian, R., Xu, Z. and Lu, J. (2013) Online Fault Diagnosis for Hydraulic Disc Brake System Using Feature Extracted from Model and an SVM Classifier. 2013 Chinese Automation Congress, Changsha, 7-8 November 2013, 228-232. [Google Scholar] [CrossRef
[3] Wang, Y. and Liu, H. (2019) Centrifugal Pump Fault Diagnosis Based on MEEMD-PE Time-Frequency Information Entropy and Random Forest. 2019 CAA Symposium on Fault Detection, Supervision and Safety for Technical Processes (SAFEPROCESS), Xiamen, 5-7 July 2019, 932-937. [Google Scholar] [CrossRef
[4] Davari, N., Akbarizadeh, G. and Mashhour, E. (2021) Intelligent Diagnosis of Incipient Fault in Power Distribution Lines Based on Corona Detection in UV-Visible Videos. IEEE Transactions on Power Delivery, 36, 3640-3648. [Google Scholar] [CrossRef
[5] 徐磊, 田颖. 基于双流Swinc Transformer的工业过程故障诊断[J]. 建模与仿真, 2023, 12(2): 777-785.
[6] Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R. and Fei-Fei, L. (2014) Large-Scale Video Classification with Convolutional Neural Networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 1725-1732. [Google Scholar] [CrossRef
[7] Simonyan, K. and Zisserman, A. (2014) Two-Stream Convolutional Networks for Action Recognition in Videos. Advances in Neural Information Processing Systems, 27, 568-575.
[8] Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., et al. (2016) Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In: Lecture Notes in Computer Science, Springer, 20-36. [Google Scholar] [CrossRef
[9] Tran, D., Bourdev, L., Fergus, R., Torresani, L. and Paluri, M. (2015) Learning Spatiotemporal Features with 3D Convolutional Networks. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 4489-4497. [Google Scholar] [CrossRef
[10] Howard, A.G., Zhu, M., Chen, B., et al. (2017) Mobile-Nets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
[11] Zhang, H., Hao, Y. and Ngo, C. (2021) Token Shift Transformer for Video Classification. Proceedings of the 29th ACM International Conference on Multimedia, China, 20-24 October 2021, 917-925. [Google Scholar] [CrossRef
[12] Iandola, F.N., Han, S., Moskewicz, M.W., et al. (2016) SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size.
[13] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale.
[14] Mehta, S. and Rastegari, M. (2021) MobileViT: Light-Weight, General-Purpose, and Mobile-Friendly Vision Transformer.
[15] Hinton, G., Vinyals, O. and Dean, J. (2015) Distilling the Knowledge in a Neural Network.
[16] Zagoruyko, S. and Komodakis, N. (2016) Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer.
[17] 徐磊. 基于多源异构信息的工业过程故障诊断策略研究[D]: [硕士学位论文]. 上海: 上海理工大学, 2023.
[18] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4510-4520. [CrossRef
[19] Zhu, M.J., Tang, Y.H. and Han, K. (2019) Vision Transfomer Pruning.
[20] Stief, A., Tan, R., Cao, Y., Ottewill, J.R., Thornhill, N.F. and Baranowski, J. (2019) A Heterogeneous Benchmark Dataset for Data Analytics: Multiphase Flow Facility Case Study. Journal of Process Control, 79, 41-55. [Google Scholar] [CrossRef
[21] Kingma, D. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. arXiv:1412.6980.
https://arxiv.org/abs/1412.6980