基于Mamba与循环最大池化的双流增强式点云分类网络
Dual-Stream Enhanced Point Cloud Classification Network Based on Mamba and Cyclic Maximum Pooling
DOI: 10.12677/mos.2025.145410, PDF,   
作者: 柴国强:上海理工大学光电信息与计算机工程学院,上海
关键词: 点云分类深度学习三维视觉特征提取Point Cloud Classification Deep Learning 3D Vision Feature Extraction
摘要: 点云分类作为三维视觉领域的核心任务,面临特征表达能力有限与排列不变性处理不足的双重挑战。针对传统基于多层感知机(MLP)的网络难以有效捕捉全局特征及动态聚合局部信息的问题,文章提出一种基于Mamba与循环最大池化的双流增强式点云分类网络。首先,通过引入Mamba模块对原始点云进行序列化建模,利用其长程依赖捕捉能力提取具有强区分性的全局特征;其次,加入循环最大池化(RMP)模块,通过多级迭代的池化操作显式提取点云排列不变特征,并结合循环机制实现局部特征的动态强化与上下文融合。双流架构中,全局特征与局部特征经自适应加权后输入MLP分类头,完成高阶语义推理。在ModelNet40与ScanObjectNN基准数据集上的实验表明,本文方法的分类准确率分别达到93.9%与86.8%,都高于先进的分类方法。消融实验进一步验证了Mamba的全局建模能力与RMP模块对无序点云的鲁棒性增强效果。
Abstract: Point cloud classification, as a core task in the field of 3D vision, is faced with the dual challenges of limited feature expressiveness and insufficient handling of alignment invariance. Aiming at the problem that traditional multilayer perceptron (MLP)-based networks can hardly effectively capture global features and dynamically aggregate local information, this paper proposes a dual-stream augmented point cloud classification network based on Mamba and cyclic max-pooling. First, the original point cloud is serialized and modeled by introducing the Mamba module, and its long-range dependency capture ability is used to extract global features with strong discriminative properties; second, the Recurrent Maximum Pooling (RMP) module is added to explicitly extract the point cloud arrangement-invariant features through multi-level iterative pooling operations, and combined with the recurrent mechanism to achieve dynamic enhancement and contextual fusion of local features. In the dual-stream architecture, global features and local features are adaptively weighted and input into the MLP classification header to complete the higher-order semantic inference. Experiments on ModelNet40 and ScanObjectNN benchmark datasets show that the classification accuracy of this paper’s method reaches 93.9% and 86.8%, respectively, both of which are higher than the state-of-the-art classification methods. The ablation experiments further validate the global modeling capability of Mamba with the robustness enhancement effect of the RMP module on disordered point clouds.
文章引用:柴国强. 基于Mamba与循环最大池化的双流增强式点云分类网络[J]. 建模与仿真, 2025, 14(5): 503-515. https://doi.org/10.12677/mos.2025.145410

参考文献

[1] You, Y. (2023) Enhancing 3D Perception with Unlabeled Repeated Historical Data for Autonomous Vehicles. Ph.D. Thesis, Cornell University.
[2] Kahn, G., Abbeel, P. and Levine, S. (2021) BADGR: An Autonomous Self-Supervised Learning-Based Navigation System. IEEE Robotics and Automation Letters, 6, 1312-1319. [Google Scholar] [CrossRef
[3] Qi, C.R., Su, H., Mo, K. and Guibas, L.J. (2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 77-85.
[4] Qi, C.R., Yi, L., Su, H. and Guibas, L.J. (2017) PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv: 1706.02413.
[5] Wu, B., Liu, Y., Lang, B. and Huang, L. (2018) DGCNN: Disordered Graph Convolutional Neural Network Based on the Gaussian Mixture Model. Neurocomputing, 321, 346-356. [Google Scholar] [CrossRef
[6] Li, Y., Bu, R., Sun, M., Wu, W., Di, X. and Chen, B. (2018) PointCNN: Convolution on Χ-Transformed Points. Neural Information Processing Systems. Curran Associates Inc.
[7] O’Shea, K. and Nash, R. (2015) An Introduction to Convolutional Neural Networks. arXiv: 1511.08458.
[8] Elhoseiny, M., Elgaaly, T., Bakry, A. and Elgammal, A. (2016) A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation. Proceedings of the 33rd International Conference on Machine Learning, New York, 20-22 June 2016, 888-897.
[9] Gong, M., Zhao, J., Liu, J., Miao, Q. and Jiao, L. (2016) Change Detection in Synthetic Aperture Radar Images Based on Deep Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, 27, 125-138. [Google Scholar] [CrossRef] [PubMed]
[10] Ma, X., Qin, C., You, H., Ran, H. and Fu, Y. (2022) Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework. arXiv: 2202.07123.
[11] Wu, W., Qi, Z. and Fuxin, L. (2019) PointConv: Deep Convolutional Networks on 3D Point Clouds. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 9613-9622. [Google Scholar] [CrossRef
[12] Li, Y., Niu, Z., Sun, Q., Xiao, H. and Li, H. (2022) BSC-Net: Background Suppression Algorithm for Stray Lights in Star Images. Remote Sensing, 14, Article 4852. [Google Scholar] [CrossRef
[13] Zhang, S., Tong, H., Xu, J. and Maciejewski, R. (2019) Graph Convolutional Networks: A Comprehensive Review. Computational Social Networks, 6, Article No. 11. [Google Scholar] [CrossRef] [PubMed]
[14] Guo, M., Cai, J., Liu, Z., Mu, T., Martin, R.R. and Hu, S. (2021) PCT: Point Cloud Transformer. Computational Visual Media, 7, 187-199. [Google Scholar] [CrossRef
[15] Xu, H., Yang, Y., Aviles-Rivero, A.I., Yang, G., Qin, J. and Zhu, L. (2024) LGRNet: Local-Global Reciprocal Network for Uterine Fibroid Segmentation in Ultrasound Videos. In: Linguraru, M.G., et al., Eds., Medical Image Computing and Computer Assisted Intervention—MICCAI 2024, Springer, 667-677. [Google Scholar] [CrossRef
[16] Zhao, H., Jiang, L., Jia, J., Torr, P. and Koltun, V. (2021) Point Transformer. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 16239-16248. [Google Scholar] [CrossRef
[17] Gu, A. and Dao, T. (2023) Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv: 2312.00752.
[18] Chen, J., Kakillioglu, B., Ren, H. and Velipasalar, S. (2022) Why Discard If You Can Recycle? A Recycling Max Pooling Module for 3D Point Cloud Analysis. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 549-557. [Google Scholar] [CrossRef
[19] Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., & Tang, X., et al. (2015) 3D ShapeNets: A Deep Representation for Volumetric Shapes. arXiv: 1406.5670.
[20] Uy, M.A., Pham, Q., Hua, B., Nguyen, T. and Yeung, S. (2019) Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 1588-1597. [Google Scholar] [CrossRef
[21] Xu, Y., Fan, T., Xu, M., Zeng, L. and Qiao, Y. (2018) SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer VisionECCV 2018, Springer, 90-105. [Google Scholar] [CrossRef
[22] Chang, Z., Gao, X., Li, N., Zhou, H. and Lu, Y. (2024) DRNet: Disentanglement and Recombination Network for Few-Shot Semantic Segmentation. IEEE Transactions on Circuits and Systems for Video Technology, 34, 5560-5574. [Google Scholar] [CrossRef
[23] Cheng, S., Chen, X., He, X., Liu, Z. and Bai, X. (2021) PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis. IEEE Transactions on Image Processing, 30, 4436-4448. [Google Scholar] [CrossRef] [PubMed]
[24] Garg, M., Ghosh, D. and Pradhan, P.M. (2024) GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, 17-18 June 2024, 2473-2483. [Google Scholar] [CrossRef
[25] Paul, S., Patterson, Z. and Bouguila, N. (2023) DualMLP: A Two-Stream Fusion Model for 3D Point Cloud Classification. The Visual Computer, 40, 5435-5449. [Google Scholar] [CrossRef
[26] Paul, S., Patterson, Z. and Bouguila, N. (2022) Improved Training for 3D Point Cloud Classification. In: Krzyzak, A., Suen, C.Y., Torsello, A. and Nobile, N., Eds., Structural, Syntactic, and Statistical Pattern Recognition, Springer, 253-263. [Google Scholar] [CrossRef
[27] Thomas, H., Qi, C.R., Deschaud, J., Marcotegui, B., Goulette, F. and Guibas, L. (2019) KPConv: Flexible and Deformable Convolution for Point Clouds. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 6410-6419. [Google Scholar] [CrossRef
[28] Hu, L., Qin, M., Zhang, F., Du, Z. and Liu, R. (2020) RSCNN: A CNN-Based Method to Enhance Low-Light Remote-Sensing Images. Remote Sensing, 13, Article 62. [Google Scholar] [CrossRef
[29] Liu, Y., Fan, B., Meng, G., Lu, J., Xiang, S. and Pan, C. (2019) DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 5238-5247. [Google Scholar] [CrossRef
[30] Yan, X., Zheng, C., Li, Z., Wang, S. and Cui, S. (2020) PointASNl: Robust Point Clouds Processing Using Nonlocal Neural Networks with Adaptive Sampling. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 5588-5597. [Google Scholar] [CrossRef
[31] Han, X.F., Kuang, Y.J. and Xiao, G Q. (2021) Point Cloud Learning with Transformer. arXiv: 2104.13636.
[32] Choe, J., Park, C., Rameau, F., Park, J. and Kweon, I.S. (2022) PointMixer: Mlp-Mixer for Point Cloud Understanding. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M. and Hassner, T., Eds., Computer Vision—ECCV 2022, Springer, 620-640. [Google Scholar] [CrossRef
[33] Cui, Y., Liu, X., Liu, H., Zhang, J., Zare, A. and Fan, B. (2021) Geometric Attentional Dynamic Graph Convolutional Neural Networks for Point Cloud Analysis. Neurocomputing, 432, 300-310. [Google Scholar] [CrossRef