基于人物交互的学生课堂行为识别研究
Research on Students’ Classroom Behavior Recognition Based on Human-Object Interaction
摘要: 学生课堂行为分析是评估课堂教学效果的有效方法,现有的学生行为识别研究仅针对学生自身进行识别,而对学生与周围物品的交互关注不够。基于此,本文提出了一种基于人物交互的学生课堂行为识别研究方法,通过分析教室监控视频,检测出学生和物品目标,并基于人与物的交互关系来识别其课堂行为。首先,考虑到笔和手机等小目标物品所占像素和可提取的有效特征较少,提出了一种改进YOLOv5s的目标检测方法,解决随着网络层数的叠加,小目标的特征信息逐渐消失导致漏检的问题。然后,为解决教室环境中目标数量较多,目标与目标间存在遮挡等因素导致网络难以提取特征的问题,引入Triplet注意力机制,增强网络提取特征的能力。接下来,采用VSGNet网络识别人与物的交互关系以确定行为类别。最后,在自制教室数据集和公开数据集进行了多组对比实验,实验结果表明,与原YOLOv5s网络相比,改进后的网络在自制和公开数据集上mAP分别提升了3.06%和3.2%,召回率分别提升了3.1%和4.2%,验证了改进方法的有效性。
Abstract: Student classroom behavior analysis is an effective way to assess the effectiveness of classroom instruction. Existing research on student behavior identification focuses only on the identification of students themselves, and not enough attention is paid to students’ interactions with surrounding objects. Based on this, this paper proposes a research method for recognizing students’ classroom behaviors based on Human-Object interaction, by analyzing classroom surveillance videos, detecting students and object targets, and recognizing their classroom behaviors based on person-object interaction. Firstly, considering that small target items such as pens and cell phones occupy fewer pixels and can extract fewer effective features, a target detection method is proposed to improve YOLOv5s to solve the problem of missing detection due to the gradual disappearance of feature information of small targets as the layers of the network are superimposed. Then, in order to solve the problem that it is difficult for the network to extract features due to a large number of targets in the classroom environment and the occlusion between targets and targets, the Triplet attention mechanism is introduced to enhance the ability of the network to extract features. Next, the VSGNet network is used to identify human-object interactions to determine behavior categories. Finally, multiple sets of comparison experiments were conducted on the self-made classroom dataset and the public dataset. The experimental results showed that the improved network improved the mAP by 3.06% and 3.2% on the self-made and public datasets, respectively, and improved the recall by 3.1% and 4.2%, respectively, compared with the original YOLOv5s network, which verified the effectiveness of the improved method.
文章引用:周珍玉, 秦学. 基于人物交互的学生课堂行为识别研究[J]. 软件工程与应用, 2022, 11(6): 1191-1203. https://doi.org/10.12677/SEA.2022.116121

参考文献

[1] 高杨凡. 基于人体骨架和深度学习的学生课堂行为识别研究[D]: [硕士学位论文]. 武汉: 华中师范大学, 2021.
[2] Wu, B., Wang, C.-M., Huang, W., et al. (2021) Recognition of Student Classroom Behaviors Based on Moving Target Detection. Traitement du Signal, 38, 215-220. [Google Scholar] [CrossRef
[3] Ge, C., Ji, J.-Q. and Huang, C.-F. (2022) Student Classroom Behavior Recognition Based on OpenPose and Deep Learning. International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, 15-17 April 2022, 576-579. [Google Scholar] [CrossRef
[4] 魏艳涛, 秦道影, 胡佳敏, 等. 基于深度学习的学生课堂行为识别[J]. 现代教育技术, 2019, 29(7): 87-91.
[5] Zhou, J., Feng, R., Guang, L., et al. (2022) Classroom Learning Status Assessment Based on Deep Learning. Mathematical Problems in Engineering, 2022, Article ID: 7049458. [Google Scholar] [CrossRef
[6] Xiao, X.-S. and Tian, X.X. (2021) Research on Reference Target Detection of Deep Learning Framework Faster-RCNN. International Conference on Data Science and Business Analytics (ICDSBA), Changsha, 24-26 September 2021, 41-44. [Google Scholar] [CrossRef
[7] Chen, L. and Wang, S.-G. (2021) Identification and Detection of Picking Targets of Orah Mandarin Orange in Natural Environment Based on SSD Model. Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, 29-31 October 2021, 439-442.
[8] Ye, K.-Q., Fang, Z.-B., Huang, X.-J., et al. (2022) Research on Small Target Detection Algorithm Based on Improved Yolov3. International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Harbin, 25-27 December 2020, 1467-1470.
[9] 王程, 刘元盛, 刘圣杰. 基于改进YOLOv4的小目标行人检测算法[J]. 计算机工程, 2022: 1-9. [Google Scholar] [CrossRef
[10] Yan, M.Y. and Sun, J.B. (2022) A Dim-Small Target Real-Time Detection Method Based on Enhanced YOLO. International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA), Changchun, 25-27 February 2022, 567-571. [Google Scholar] [CrossRef
[11] Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improvement.
[12] Yang, J.Y. and Jiang, J. (2021) Dilated-CBAM: An Efficient Attention Network with Dilated Convolution. International Conference on Unmanned Systems (ICUS), Beijing, 15-17 October 2021, 11-15. [Google Scholar] [CrossRef
[13] Wen, X., Pan, Z.-X., Hu, H.-Y., et al. (2022) An Effective Network Integrating Residual Learning and Channel Attention Mechanism for Thin Cloud Removal. IEEE Geoscience and Remote Sensing Letters, 19, 1-5. [Google Scholar] [CrossRef
[14] Bochkovskiy, A., Wang, C.-Y. and Liao, H.-Y.M. (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection.
[15] Xu, B.J., Li, J.N., Wong, Y.K., et al. (2020) Interact as You Intend: Intention Driven Human Object Interaction Detection. IEEE Transactions on Multimedia, 22, 1423-1432. [Google Scholar] [CrossRef
[16] Siadari, T.S., Han, M. and Yoon, H. (2020) Three-Stream Network with Context Convolution Module for Human-Object Interaction Detection. ETRI Journal, 42, 230-238. [Google Scholar] [CrossRef
[17] Gao, C. and Zou, Y.L. (2018) iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection. 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, 18-22 June 2018, 1-13.
[18] Gupta, S. and Malik, J. (2015) Visual Semantic Role Labeling.
[19] Kolesnikov, A., Kuznetsova, A., Lampert, C., et al. (2019) Detecting Visual Relationships Using Box Attention. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, 27-28 October 2019, 1749-1753. [Google Scholar] [CrossRef
[20] Shao, Z.P., Hu, Z.Y., Yang, J.Y., et al. (2022) Multi-Stream Feature Refinement Network for Human Object Interaction Detection. Journal of Visual Communication and Image Representation, 86, Article ID: 103529. [Google Scholar] [CrossRef
[21] Ulutan, O., Iftekhar, A. and Manjunath, B.S. (2020) VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 13614-13623. [Google Scholar] [CrossRef
[22] Misra, D., Nalamada, T., Arasanipalai, A.U. and Hou, Q.B. (2020) Rotate to Attend: Convolutional Triplet Attention Module. IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, 1-5 March 2020, 3138-3147. [Google Scholar] [CrossRef
[23] Woo, S., Park, J., Lee, J.-Y. and Kweon, I. (2018) Cbam: Convolutional Block Attention Module. 15th European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 3-19. [Google Scholar] [CrossRef