基于YOLOv11与空间特征增强的学生课堂行为检测方法研究
Research on Student Classroom Behavior Detection Method Based on YOLOv11 with Spatial Feature Enhancement
DOI: 10.12677/csa.2026.162037, PDF,    科研立项经费支持
作者: 康雯婷, 林 琳:吉林化工大学信息与控制工程学院,吉林 吉林;平 源*:许昌学院信息工程学院,河南 许昌;河南省大数据安全与应用工程技术研究中心,河南 许昌;李乐俭:许昌学院信息工程学院,河南 许昌
关键词: 课堂行为目标检测YOLOv11注意力机制特征增强Classroom Behavior Object Detection YOLOv11 Attention Mechanism Feature Enhancement
摘要: 针对课堂场景中学生行为目标尺度较小、姿态变化幅度有限及不同行为在外观特征上有较高相似性等问题,本文构建了一种新的YOLOv11-RMS模型。该模型以YOLOv11为基础,在主干网络中引入基于Restormer的特征提取模块,通过建模长程依赖关系增强特征的全局建模能力;其次,在主干网络末端融合多级通道注意力机制(MLCA),以强化关键语义特征并抑制冗余信息;最后,在检测头阶段引入基于空间自适应特征调制的SAFMP模块,对上采样后的浅层特征进行重构与增强,提升模型在复杂课堂环境下检测稳定性。实验结果表明,YOLOv11-RMS模型在SCB数据集和自建学生课堂行为数据集CLASS上的mAP50分别达到72.6%和84.9%,较主流模型取得了明显提升。
Abstract: To address the challenges of small target scales, limited posture variations, and high visual similarity among different student behaviors in classroom scenarios, a novel YOLOv11-RMS model is proposed. Based on the YOLOv11 framework, a Restormer-based feature extraction module is incorporated into the backbone to enhance global feature modeling through long-range dependency learning. Subsequently, a multi-level channel attention mechanism (MLCA) is integrated at the end of the backbone to strengthen discriminative semantic features while suppressing redundant information. Furthermore, a spatially adaptive feature modulation module (SAFMP) is introduced in the detection head to reconstruct and enhance upsampled shallow features, thereby improving detection robustness in complex classroom environments. Experimental results demonstrate that the proposed YOLOv11-RMS model achieves mAP50 scores of 72.6% and 84.9% on the SCB dataset and the self-collected classroom behavior dataset (CLASS), respectively, outperforming existing mainstream methods.
文章引用:康雯婷, 平源, 林琳, 李乐俭. 基于YOLOv11与空间特征增强的学生课堂行为检测方法研究[J]. 计算机科学与应用, 2026, 16(2): 40-49. https://doi.org/10.12677/csa.2026.162037

参考文献

[1] Liu, Q., Jiang, X. and Jiang, R. (2025) Classroom Behavior Recognition Using Computer Vision: A Systematic Review. Sensors, 25, Article 373. [Google Scholar] [CrossRef] [PubMed]
[2] Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587. [Google Scholar] [CrossRef
[3] Ren, S., He, K., Girshick, R. and Sun, J. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef] [PubMed]
[4] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., et al. (2016) SSD: Single Shot MultiBox Detector. In: Lecture Notes in Computer Science, Springer, 21-37. [Google Scholar] [CrossRef
[5] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef
[6] Chen, H. and Guan, J. (2022) Teacher-Student Behavior Recognition in Classroom Teaching Based on Improved YOLO-V4 and Internet of Things Technology. Electronics, 11, Article 3998. [Google Scholar] [CrossRef
[7] Jia, Q. and He, J. (2024) Student Behavior Recognition in Classroom Based on Deep Learning. Applied Sciences, 14, Article 7981. [Google Scholar] [CrossRef
[8] Peng, S., Zhang, X., Zhou, L. and Wang, P. (2025) YOLO-CBD: Classroom Behavior Detection Method Based on Behavior Feature Extraction and Aggregation. Sensors, 25, Article 3073. [Google Scholar] [CrossRef] [PubMed]
[9] Sheng, X., Li, S. and Chan, S. (2025) Real-Time Classroom Student Behavior Detection Based on Improved YOLOv8s. Scientific Reports, 15, Article No. 14470. [Google Scholar] [CrossRef] [PubMed]
[10] Khanam, R. and Hussain, M. (2024) YOLOv11: An Overview of the Key Architectural Enhancements.
https://www.arxiv.org/pdf/2410.17725
[11] Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S. and Yang, M. (2022) Restormer: Efficient Transformer for High-Resolution Image Restoration. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 5718-5729. [Google Scholar] [CrossRef
[12] Wan, D., Lu, R., Shen, S., Xu, T., Lang, X. and Ren, Z. (2023) Mixed Local Channel Attention for Object Detection. Engineering Applications of Artificial Intelligence, 123, Article 106442. [Google Scholar] [CrossRef
[13] Sun, L., Dong, J., Tang, J. and Pan, J. (2023) Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 2-6 October 2023, 13144-13153. [Google Scholar] [CrossRef
[14] Yang, F. (2023) SCB-Dataset: A Dataset for Detecting Student and Teacher Classroom Behavior. [Google Scholar] [CrossRef
[15] Khanam, R. and Hussain, M. (2024) What Is YOLOv5: A Deep Look into the Internal Features of the Popular Object Detector. [Google Scholar] [CrossRef
[16] Varghese, R. and Sambath, M. (2024) YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, 28-29 March 2024, 1-6. [Google Scholar] [CrossRef