基于PP-PicoDet的轻量级目标检测与LoRA参数高效微调研究
Research on Lightweight Object Detection and LoRA-Based Parameter-Efficient Fine-Tuning for PP-PicoDet
摘要: 针对移动端版面分析中大尺度元素(如复杂表格、整页正文块)识别难度大,且在小样本约束下轻量级检测器全量微调易过拟合的问题,本文以PP-PicoDet为基准,提出一种基于非对称分层LoRA (Low-Rank Adaptation)的参数高效微调方案。该方法通过在主干网络深层与分类检测头中部署高秩适配器(r = 128),并在特征融合颈部实施轻量级注入,以重塑模型对大尺度版面元素的全局结构表征能力。在自建的包含8类大尺度版面元素的小样本数据集(1000张图像)上进行验证,实验结果表明:本文方法仅需6.5%的可训练参数量,即可实现0.581的mAP0.5:0.95以及0.847的mAP0.5,在推理速度(18.6 FPS)保持不变的前提下,达到了全量微调94%以上的检测性能。针对area = large目标的专项评估显示,该策略有效解决了全量微调在有限数据下的泛化剧变瓶颈,实现了精度、稳定性与训练成本的深度折衷(Trade-off)。消融实验进一步证实,针对特定层级的非对称Rank配置是捕捉版面长程依赖的关键,为轻量化版面分析模型在边缘侧的快速定制化部署提供了高效路径。
Abstract: To address the difficulty of detecting large-scale elements in mobile document layout analysis, such as complex tables and full-page text blocks, as well as the tendency of full fine-tuning lightweight detectors to overfit under small-sample constraints, this paper takes PP-PicoDet as the baseline and proposes a parameter-efficient fine-tuning scheme based on asymmetric hierarchical LoRA (Low-Rank Adaptation). The proposed method deploys high-rank adapters (r = 128) in the deep layers of the backbone and the classification head, while introducing lightweight adaptation into the feature fusion neck, thereby enhancing the model’s ability to capture the global structural representations of large-scale layout elements. Experiments conducted on a self-constructed small-sample dataset containing 1000 images and 8 categories of large-scale layout elements show that the proposed method requires only 6.5% of the trainable parameters while achieving 0.561 and 0.820, respectively. Without sacrificing inference speed (18.6 FPS), it attains more than 94% of the detection performance of full fine-tuning. Further evaluation on large objects area = large demonstrates that the proposed strategy effectively alleviates the sharp generalization degradation of full fine-tuning under limited data conditions, achieving a balance among accuracy, stability, and training cost. Ablation experiments further verify that the asymmetric rank configuration for specific network layers is critical for capturing long-range dependencies in document layouts, providing an efficient path for the rapid customized deployment of lightweight layout analysis models on edge devices.
文章引用:王乐天, 孙仁诚. 基于PP-PicoDet的轻量级目标检测与LoRA参数高效微调研究[J]. 软件工程与应用, 2026, 15(2): 340-350. https://doi.org/10.12677/sea.2026.152032

参考文献

[1] Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2117-2125. [Google Scholar] [CrossRef
[2] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., et al. (2016) SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Lecture Notes in Computer Science, Springer International Publishing, 21-37. [Google Scholar] [CrossRef
[3] Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 1-26 July 2017, 7263-7271. [Google Scholar] [CrossRef
[4] Law, H. and Deng, J. (2018) Cornered: Detecting Objects as Paired Keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Lecture Notes in Computer Science, Springer International Publishing, 765-781. [Google Scholar] [CrossRef
[5] Zhou, X., Wang, D. and Krähenbühl, P. (2019) Objects as Points. arXiv:1904.07850.
[6] Tian, Z., Shen, C., Chen, H. and He, T. (2019) FCOS: Fully Convolutional One-Stage Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October 2019-2 November 2019, 9627-9636. [Google Scholar] [CrossRef
[7] Zhang, S., Chi, C., Yao, Y., Lei, Z. and Li, S.Z. (2020) Bridging the Gap between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 9759-9768. [Google Scholar] [CrossRef
[8] Li, X., Wang, W., Wu, L., et al. (2020) Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. Advances in Neural Information Processing Systems, 33, 21002-21012.
[9] Ge, Z., Liu, S., Wang, F., et al. (2021) YOLOX: Exceeding Yolo Series in 2021. arXiv:2107.08430.
[10] Ghiasi, G., Lin, T. and Le, Q.V. (2019) NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 7036-7045. [Google Scholar] [CrossRef
[11] Xiong, Y., Liu, H., Gupta, S., et al. (2020) Mobiledets: Searching for Object Detection Architectures for Mobile Accelerators. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11958-11967.
[12] Hu, E.J., Shen, Y., Wallis, P., et al. (2021) LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685.
[13] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N. (2021) Varifocalnet: An IoU-Aware Dense Object Detector. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 8514-8523. [Google Scholar] [CrossRef
[14] Yu, G., Chang, Q., Lv, W., et al. (2021) PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices. arXiv:2111.00902.