基于YOLOv10的事故调查报告版面分析方法研究
Research on Accident Investigation Report Layout Analysis Method Based on YOLOv10
DOI: 10.12677/csa.2026.161025, PDF,    国家科技经费支持
作者: 潘令宇, 周子翔:华北科技学院计算机科学与工程学院,北京;张云雷*:河北省物联网监控技术创新中心,北京;武文星:青海师范大学计算机学院,青海 西宁
关键词: 事故调查报告版面分析YOLO文档智能Accident Investigation Report Layout Analysis YOLO Document Intelligence
摘要: 近年来,随着自然和人为灾害的频发,事故调查报告日益彰显其重要性。针对事故调查报告版面布局复杂、元素种类繁多且尺度差异大,导致现有模型在关键信息提取中出现检测框重叠和小尺度元素识别困难等问题,本文提出了一种基于改进YOLOv10的事故调查报告版面分析方法。首先,通过收集整理各省应急管理部门发布的文档,构建了包含2500张图像的事故调查报告专题数据集,并定义了23种细粒度布局元素标签以增强语义区分度。其次,在YOLOv10主干网络中引入GL-CRM模块,通过动态分配计算资源,增强模型对不同尺度目标的特征提取能力。同时,利用YOLOv10的无非极大值抑制(NMS-free)策略,减少检测框重叠并提高推理速度。实验结果表明,改进后的模型在专题数据集上的F1分数和mAP分别达到了87.72%和88.5%,相较于基线模型分别提升了约4%和7%。验证了该方法在事故调查报告文档智能化处理中的有效性和优越性。
Abstract: In recent years, with the frequent occurrence of natural and man-made disasters, the importance of accident investigation reports has become increasingly apparent. To address the challenges of complex layouts, diverse element types, and significant scale variations in accident investigation reports—which lead to overlapping detection boxes and difficulties in recognizing small-scale elements with existing models during key information extraction—we propose a layout analysis method based on an improved YOLOv10. First, by collecting documents published by provincial emergency management departments, we construct a specialized dataset for accident investigation reports containing 2500 images, and 23 fine-grained layout element labels were defined to enhance semantic discrimination. Second, a GL-CRM module is introduced into the YOLOv10 backbone network. By dynamically allocating computational resources, this module enhances the model’s feature extraction capabilities for targets of different scales, thereby resolving the issue of feature loss in fine-grained elements. Additionally, utilizing YOLOv10’s NMS-free strategy, a dual-head architecture (comprising one-to-many and one-to-one detection heads) is adopted to reduce detection box overlap and improve inference speed. Experimental results indicate that the improved model achieved an F1 score of 87.72% and an mAP of 88.5% on the self-constructed dataset, representing improvements of approximately 0.04 and 0.7, respectively, compared to the baseline model. These results validate the effectiveness and superiority of the proposed method in the intelligent processing of accident investigation report documents.
文章引用:潘令宇, 周子翔, 张云雷, 武文星. 基于YOLOv10的事故调查报告版面分析方法研究[J]. 计算机科学与应用, 2026, 16(1): 305-316. https://doi.org/10.12677/csa.2026.161025

参考文献

[1] Zhang, Y., Wu, B., Ning, N., Song, C. and Lv, J. (2019) Dynamic Topical Community Detection in Social Network: A Generative Model Approach. IEEE Access, 7, 74528-74541. [Google Scholar] [CrossRef
[2] Ke, W., Zheng, Y., Li, Y., Xu, H., Nie, D., Wang, P., et al. (2025) Large Language Models in Document Intelligence: A Comprehensive Survey, Recent Advances, Challenges, and Future Trends. ACM Transactions on Information Systems, 44, 1-64. [Google Scholar] [CrossRef
[3] Wang, A., et al. (2024) YOLOv10: Real-Time End-to-End Object Detection. Conference on Neural Information Processing Systems, Vancouver, 9-15 December 2024, 107984-108011.
[4] Zhao, Z.Y., Kang, H.R., Wang, B. and He, C.H. (2024) DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception. Computing Research Repository.
[5] Zhong, X., Tang, J. and Jimeno Yepes, A. (2019) PubLayNet: Largest Dataset Ever for Document Layout Analysis. 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, 20-25 September 2019, 1015-1022. [Google Scholar] [CrossRef
[6] Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A.S. and Staar, P. (2022) DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington DC, 14-18 August 2022, 3743-3751. [Google Scholar] [CrossRef
[7] Livathinos, N., et al. (2025) Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion. Computing Research Repository.
[8] Gao, L.C., et al. (2019) ICDAR 2019 Competition on Table Detection and Recognition (cTDaR). IEEE International Conference on Document Analysis and Recognition, Sydney, 20-25 September 2019, 1510-1515.
[9] Yu, J.-M., Ma, H.-J. and Kong, J.-L. (2025) Receipt Recognition Technology Driven by Multimodal Alignment and Lightweight Sequence Modeling. Electronics, 14, Article No. 1717. [Google Scholar] [CrossRef
[10] Luo, Y., Zhang, H., Wang, Y., Wen, Y. and Zhang, X. (2018) ResumeNet: A Learning-Based Framework for Automatic Resume Quality Assessment. 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17-20 November 2018, 307-316. [Google Scholar] [CrossRef
[11] Varghese, R. and M., S. (2024) YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, 18-19 April 2024, 1-6. [Google Scholar] [CrossRef
[12] Wang, C.-Y., Yeh, I.-H. and Mark Liao, H.-Y. (2024) YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. Computing Research Repository.
[13] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef
[14] Peña, A., Morales, A., Fierrez, J., Ortega-Garcia, J., Puente, I., Cordova, J., et al. (2024) Continuous Document Layout Analysis: Human-in-the-Loop AI-Based Data Curation, Database, and Evaluation in the Domain of Public Affairs. Information Fusion, 108, Article ID: 102398. [Google Scholar] [CrossRef
[15] Zottin, S., et al. (2024) U-DIADS-Bib: A Full and Few-Shot Pixel-Precise Dataset for Document Layout Analysis of Ancient Manuscripts. Neural Computing and Applications, 36, 11777-11789.
[16] Ilani, M.A. and Banad, Y.M. (2025) LabelImg: CNN-Based Surface Defect Detection.
[17] Wang, C., Mark Liao, H., Wu, Y., Chen, P., Hsieh, J. and Yeh, I. (2020) CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, 14-19 June 2020, 1571-1580. [Google Scholar] [CrossRef
[18] Hosang, J., Benenson, R. and Schiele, B. (2017) Learning Non-Maximum Suppression. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6469-6477. [Google Scholar] [CrossRef
[19] Esser, P., Rombach, R. and Ommer, B. (2021) Taming Transformers for High-Resolution Image Synthesis. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 12868-12878. [Google Scholar] [CrossRef