NexusNet:一种面向面部颜色识别的双路径层次化融合混合网络
NexusNet: A Dual-Path Hierarchical Fusion Hybrid Network for Facial Color Recognition
摘要: 针对当前卷积神经网络(CNN)在建模长距离依赖上的局限,以及视觉Transformer因自注意力机制导致的参数量庞大问题,本文提出一种双路径混合模型——NexusNet。该模型通过深度融合CNN的局部表示分支与Transformer的全局建模分支,实现了局部细节特征与全局语义信息的协同编码,在显著提升特征表征能力的同时保持了精简的参数量。在CNN分支中,我们引入了融合动态权重分配与上下文增强机制的新型模块,以增强对判别性局部结构的捕捉能力;在Transformer分支中,采用分层建模与线性复杂度设计,大幅降低了长距离依赖建模的资源开销。此外,设计了一种自适应多层次特征融合模块,通过通道与空间注意力引导的多尺度特征整合,实现跨架构信息的高效聚合与参数优化。在两个面部颜色识别数据集上的实验表明,NexusNet在保持模型轻量化的前提下,分类准确率分别达到88.99%和79.25%,并在多项评价指标上优于现有主流方法,验证了其在局部‑全局特征融合与模型轻量化方面的有效性与泛化能力。
Abstract: To address the limitations of Convolutional Neural Networks (CNNs) in modeling long-range dependencies and the high parameter complexity of Vision Transformers, this paper proposes a dual-path hybrid model, NexusNet. The model integrates a CNN-based pathway for local feature extraction with a Transformer-based pathway for global context modeling, enabling effective fusion of fine-grained details and semantic information while maintaining model compactness. In the CNN pathway, we introduce a novel module that combines dynamic weight allocation with a context enhancement mechanism to improve discriminative local feature capture. The Transformer pathway employs a hierarchical structure with linear complexity to efficiently model long-range dependencies. Furthermore, we design an adaptive multi-level feature fusion module that leverages both channel and spatial attention to guide the integration of multi-scale features from both architectures, promoting efficient information aggregation. Experimental results on two facial color recognition datasets demonstrate that NexusNet achieves classification accuracies of 88.99% and 79.25%, respectively, and outperforms existing methods across multiple metrics. This validates the model’s strong performance and generalization ability in joint local-global representation learning and efficient model design.
文章引用:孙千帅, 冯跃, 林卓胜, 梁洁欣, 赵雪, 刘子豪. NexusNet:一种面向面部颜色识别的双路径层次化融合混合网络[J]. 计算机科学与应用, 2026, 16(1): 154-168. https://doi.org/10.12677/csa.2026.161013

参考文献

[1] Liu, C., Zhao, C., Li, G., Li, F. and Wang, Z. (2013) Computerized Color Analysis for Facial Diagnosis in Traditional Chinese Medicine. 2013 IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, 18-21 December 2013, 613-614. [Google Scholar] [CrossRef
[2] 林怡, 王斌, 许家佗, 等. 基于面部图像特征融合的中医望诊面色分类研究[J]. 实用临床医药杂志, 2020, 24(14): 1-5. [Google Scholar] [CrossRef
[3] Zhao, K., Ma, X., Kuang, H. and Liu, X. (2024) Facial Complexion Classification of Traditional Chinese Medicine Based on Statistical Features and MobileViT. 2024 IEEE 7th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, 20-22 September 2024, 50-54. [Google Scholar] [CrossRef
[4] Yang, G., Luo, S. and Greer, P. (2023) A Novel Vision Transformer Model for Skin Cancer Classification. Neural Processing Letters, 55, 9335-9351. [Google Scholar] [CrossRef
[5] Rangel, G., Cuevas-Tello, J.C., Nunez-Varela, J., Puente, C. and Silva-Trujillo, A.G. (2024) A Survey on Convolutional Neural Networks and Their Performance Limitations in Image Recognition Tasks. Journal of Sensors, 2024, Article ID: 2797320. [Google Scholar] [CrossRef
[6] Liu, Z., Mao, H., Wu, C., Feichtenhofer, C., Darrell, T. and Xie, S. (2022) A ConvNet for the 2020s. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11966-11976. [Google Scholar] [CrossRef
[7] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 9992-10002. [Google Scholar] [CrossRef
[8] Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., et al. (2022) Swin Transformer V2: Scaling up Capacity and Resolution. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11999-12009. [Google Scholar] [CrossRef
[9] Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y. and Hu, W. (2021) Channel-Wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 13339-13348. [Google Scholar] [CrossRef
[10] Yang, L., Zhang, R.Y., Li, L., et al. (2021) SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Network. International Conference on Machine Learning PMLR, 18-24 July 2021, 11863-11874.
[11] Zhang, B. and Sennrich, R. (2019) Root Mean Square Layer Normalization. arXiv: 1910.07467.
[12] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv: 1409.1556.
[13] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[14] Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q. (2017) Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2261-2269. [Google Scholar] [CrossRef
[15] Dosovitskiy, A. (2020) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929.
[16] Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C. and Xu, C. (2020) GhostNet: More Features from Cheap Operations. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 1577-1586. [Google Scholar] [CrossRef
[17] Huo, X., Sun, G., Tian, S., Wang, Y., Yu, L., Long, J., et al. (2024) HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical Image Classification. Biomedical Signal Processing and Control, 87, Article ID: 105534. [Google Scholar] [CrossRef