基于特征校准与频域解耦的视频隐式神经表示
CF-NeRV: Video Implicit Neural Representation Based on Feature Calibration and Frequency Decoupling
摘要: 视频隐式神经表示(INR)为视频压缩提供了全新范式,但现有基于NeRV的方法仍受限于特征表示的各向同性冗余,以及深度网络固有的“频谱偏差”导致的高频纹理丢失。为此,本文提出一种基于特征校准与频域解耦的视频隐式神经表示网络(CF-NeRV)。首先,通过重构基础解码单元提出内容自适应特征校准模块(CFC-Block),在卷积前引入并发时空注意力机制以精准预校准显著特征并抑制背景冗余;其次,设计频域感知细化模块(FARM),通过显式频域解耦与自适应门控策略强制模型补偿高频残差信号;最后,引入误差感知解耦重建头(ERRH),通过主、副支路的专业化分工实现从特征到像素空间的高精度动态映射。在Bunny及UVG上的7个数据集上的实验结果表明,CF-NeRV的重建质量显著优于HNeRV、E-NeRV等主流模型,其中在UVG数据集上实现了1.29 dB的平均PSNR提升,充分验证了所提机制在复杂视频重建任务中的优越性与高效性。
Abstract: Video Implicit Neural Representation (INR) has emerged as a promising paradigm for video compression. However, existing NeRV-based approaches are still limited by the isotropic redundancy in feature representation and the loss of high-frequency details caused by the inherent “spectral bias” of deep neural networks. To address these challenges, we propose CF-NeRV, a video implicit neural representation network based on feature calibration and frequency decoupling. Specifically, we first reconstruct the basic decoding unit by introducing a Content-Adaptive Feature Calibration module (CFC-Block), which incorporates concurrent spatio-temporal attention mechanisms prior to convolution to accurately pre-calibrate salient features and suppress background redundancy. Secondly, a Frequency-Aware Refinement Module (FARM) is designed to force the model to compensate for missing high-frequency residual signals through an explicit frequency-domain decoupling strategy and adaptive gating mechanisms. Finally, an Error-Aware Decoupled Reconstruction head (ERRH) is introduced to achieve high-precision dynamic mapping from the feature space to the pixel space via a specialized functional division between the main and auxiliary branches. Experimental results on 7 benchmarks, including Bunny and UVG datasets, demonstrate that CF-NeRV significantly outperforms state-of-the-art models such as HNeRV and E-NeRV. Notably, our method achieves an average PSNR improvement of 1.29 dB on the UVG dataset, validating the superiority and efficiency of the proposed mechanism in complex video reconstruction tasks.
参考文献
|
[1]
|
Chen, H., He, B., Wang, H., et al. (2021) Nerv: Neural Representations for Videos. Advances in Neural Information Processing Systems, 34, 21557-21568.
|
|
[2]
|
Chen, H., Gwilliam, M., Lim, S.-N. and Shrivastava, A. (2023) HNeRV: A Hybrid Neural Representation for Videos. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 10270-10279. [Google Scholar] [CrossRef]
|
|
[3]
|
Li, Z.Z., Wang, M.M., Pi, H.J., Xu, K.C., Mei, J.B. and Liu, Y. (2022) E-nerv: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context. ECCV, 2.
|
|
[4]
|
Wiegand, T., Sullivan, G.J., Bjontegaard, G. and Luthra, A. (2003) Overview of the H.264/AVC Video Coding Standard. IEEE Transactions on Circuits and Systems for Video Technology, 13, 560-576. [Google Scholar] [CrossRef]
|
|
[5]
|
Sullivan, G.J., Ohm, J.-R., Han, W.-J. and Wiegand, T. (2012) Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for Video Technology, 22, 1649-1668. [Google Scholar] [CrossRef]
|