基于高效时空建模的多帧融合视频去雾模型
Multi-Frame Fusion Video Dehazing Model Based on Efficient Spatiotemporal Modeling
DOI: 10.12677/csa.2026.165192, PDF,    国家自然科学基金支持
作者: 邵玉娇, 魏伟波*, 潘振宽:青岛大学计算机科学技术学院,山东 青岛
关键词: 视频去雾时空建模深度学习3D卷积帧间融合神经网络Video Dehazing Spatiotemporal Modeling Deep Learning 3D Convolution Inter-Frame Fusion Neural Network
摘要: 针对视频去雾任务中时空信息利用不足导致去雾后视频连贯性差的问题,文章提出了一种基于编码器–解码器的类U-Net网络的新型视频去雾模型(UnDehazeNet)。该模型借鉴先进时空建模思想,构建局部与全局协同的特征学习机制,无需依赖显式光流计算即可捕捉帧间运动规律与雾气分布特性,有效控制算力开销。同时,在编码器–解码器部分集成可变形三维卷积,强化多尺度特征的挖掘与融合能力,充分发挥类U-Net架构在图像恢复领域的优势,该模型能够在有效实现高质量视频去雾的同时,保证视频的连贯性。在REVIDE与HazeWorld数据集上的实验结果表明,UnDehazeNet在峰值信噪比(Peak Signal-to-Noise Ratio, PSNR)、结构相似性指数(Structural Similarity Index Measure, SSIM)两项核心定量指标及定性可视化效果中均表现更优,综合性能显著提升。
Abstract: To address the problem of insufficient spatiotemporal information utilization in video dehazing tasks, which often results in poor temporal coherence of dehazed videos, this paper proposes a novel video dehazing model based on an encoder-decoder U-Net-like architecture, termed UnDehazeNet. Drawing on advanced spatiotemporal modeling concepts, the model constructs a feature learning mechanism that coordinates local and global interactions, enabling it to capture inter-frame motion patterns and fog distribution characteristics without relying on explicit optical flow computation, thereby effectively controlling computational overhead. Meanwhile, deformable 3D convolutions are integrated into the encoder-decoder to enhance the extraction and fusion of multi-scale features, fully leveraging the inherent advantages of the U-Net-like architecture in image restoration. Consequently, the proposed model achieves high-quality video dehazing while ensuring temporal coherence. Experimental results on the REVIDE and HazeWorld datasets demonstrate that UnDehazeNet outperforms comparative methods in both core quantitative metrics (PSNR and SSIM) as well as qualitative visualizations, with significantly improved overall performance.
文章引用:邵玉娇, 魏伟波, 潘振宽. 基于高效时空建模的多帧融合视频去雾模型[J]. 计算机科学与应用, 2026, 16(5): 392-405. https://doi.org/10.12677/csa.2026.165192

参考文献

[1] 赵世吉, 张金钊, 林立飞, 等. 基于FFA-Net与YOLOv5的雾天行车障碍检测技术研究[J]. 中阿科技论坛(中英文), 2022(9): 141-144.
[2] 贾童瑶, 卓力, 李嘉锋, 等. 基于深度学习的单幅图像去雾研究进展[J]. 电子学报, 2023, 51(1): 231-245.
[3] Narasimhan, S.G. and Nayar, S.K. (2002) Vision and the Atmosphere. International Journal of Computer Vision, 48, 233-254. [Google Scholar] [CrossRef
[4] Zhang, H. and Patel, V.M. (2018) Densely Connected Pyramid Dehazing Network. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 3194-3203. [Google Scholar] [CrossRef
[5] 刘姝廷, 孙诚志, 娄浩云, 等. 基于直方图均衡化和Retinex的图像去雾研究[J]. 信息与电脑(理论版), 2023, 35(15): 172-175.
[6] He, K.M., Sun, J. and Tang, X.O. (2011) Single Image Haze Removal Using Dark Channel Prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 2341-2353. [Google Scholar] [CrossRef] [PubMed]
[7] Cai, B., Xu, X., Jia, K., Qing, C. and Tao, D. (2016) DehazeNet: An End-to-End System for Single Image Haze Removal. IEEE Transactions on Image Processing, 25, 5187-5198. [Google Scholar] [CrossRef] [PubMed]
[8] Li, B., Peng, X., Wang, Z., Xu, J. and Feng, D. (2017) AOD-Net: All-in-One Dehazing Network. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 4780-4788. [Google Scholar] [CrossRef
[9] 边宇霄. 基于深度学习的端到端图像去雾算法研究[D]: [硕士学位论文]. 长春: 吉林大学, 2024.
[10] Borkar, K. and Mukherjee, S. (2018) Video Dehazing Using LMNN with Respect to Augmented MRF. Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing, Hyderabad, 18-22 December 2018, 1-9. [Google Scholar] [CrossRef
[11] Zhang, J., Li, L., Zhang, Y., Yang, G., Cao, X. and Sun, J. (2011) Video Dehazing with Spatial and Temporal Coherence. The Visual Computer, 27, 749-757. [Google Scholar] [CrossRef
[12] 宁贝, 杨明. 基于多尺度引导滤波的实时视频去雾算法[J]. 中北大学学报(自然科学版), 2024, 45(4): 439-447.
[13] Kim, J., Jang, W., Park, Y., Lee, D., Sim, J. and Kim, C. (2012) Temporally X Real-Time Video Dehazing. 2012 19th IEEE International Conference on Image Processing, Orlando, 30 September-3 October 2012, 969-972. [Google Scholar] [CrossRef
[14] Ren, W., Zhang, J., Xu, X., Ma, L., Cao, X., Meng, G., et al. (2019) Deep Video Dehazing with Semantic Segmentation. IEEE Transactions on Image Processing, 28, 1895-1908. [Google Scholar] [CrossRef] [PubMed]
[15] Zhang, X., Dong, H., Pan, J., Zhu, C., Tai, Y., Wang, C., et al. (2021) Learning to Restore Hazy Video: A New Real-World Dataset and a New Method. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 9235-9244. [Google Scholar] [CrossRef
[16] Yang, Y., Guo, C. and Guo, X. (2024) Depth-Aware Unpaired Video Dehazing. IEEE Transactions on Image Processing, 33, 2388-2403. [Google Scholar] [CrossRef] [PubMed]
[17] 林志鹏, 秦佳, 秦品乐, 等. 基于物理先验引导的记忆增强视频去雾算法[J]. 中北大学学报(自然科学版), 2025, 46(6): 726-733.
[18] Xu, J., Hu, X., Zhu, L., Dou, Q., Dai, J., Qiao, Y., et al. (2023) Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 18053-18062. [Google Scholar] [CrossRef
[19] Li, K., Wang, Y., Gao, P., et al. (2022) UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning. arXiv: 2201.04676.
[20] Li, K., Wang, Y., He, Y., Li, Y., Wang, Y., Wang, L., et al. (2023) UniFormerV2: Unlocking the Potential of Image Vits for Video Understanding. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 1632-1643. [Google Scholar] [CrossRef
[21] Shi, W., Caballero, J., Huszar, F., Totz, J., Aitken, A.P., Bishop, R., et al. (2016) Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 1874-1883. [Google Scholar] [CrossRef
[22] Zhao, H., Gallo, O., Frosio, I. and Kautz, J. (2017) Loss Functions for Image Restoration with Neural Networks. IEEE Transactions on Computational Imaging, 3, 47-57. [Google Scholar] [CrossRef
[23] Wang, Z., Bovik, A.C., Sheikh, H.R. and Simoncelli, E.P. (2004) Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 13, 600-612. [Google Scholar] [CrossRef] [PubMed]
[24] Johnson, J., Alahi, A. and Li, F.F. (2016) Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer VisionECCV 2, Springer, 694-711. [Google Scholar] [CrossRef
[25] Liu, Y., Wan, L., Fu, H., Qin, J. and Zhu, L. (2022) Phase-Based Memory Network for Video Dehazing. Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, 10-14 October 2022, 5247-5435. [Google Scholar] [CrossRef