基于短时动态卷积的吞咽动作识别研究
Studies on Swallowing Action Recognition Using Short-Time Dynamic Convolution
DOI: 10.12677/csa.2026.164142, PDF,   
作者: 周子森:五邑大学电子与信息工程学院,广东 江门;杨 建*:五邑大学机械与自动化工程学院,广东 江门
关键词: 人体运动识别柔性压电传感器时间序列分类吞咽动作动态卷积Human Action Recognition Flexible Piezoelectric Sensor Time Series Classification Swallow Action Dynamic Convolution
摘要: 吞咽困难或吞咽功能障碍严重影响人类健康、尤其影响老年人群体,而目前现有的吞咽识别模型存在精度不足、模型参数冗余等问题。针对上述问题,本研究提出采用自主研发的柔性压电聚乳酸传感器作为数据采集终端,通过一种短时动态卷积(STDyConv)方法,用于分析受试者进行吞咽动作时采集的生理信号。该方法是由一个短时频率自适应卷积模块和一个全局传递卷积核组成,其中短时自适应卷积模块的参数存储在两个张量中,分别代表频率特征的实部和虚部。通过基于输入生理信号的参数微调和优化,卷积核能够精确响应输入信号的多维频率特征。为了进一步增强模型的场景适应能力,在每个卷积核中引入了频率加权因子,以实现对频率特定特征的差异化关注。全局传递卷积核仅需存储一个初始卷积核,其余卷积核均基于该初始卷积核生成,有效抑制了动态卷积模型参数的冗余增长。实验结果表明,STDyConv与传统的卷积神经网络(CNN)架构具有良好的兼容性,并且与传统方法相比,吞咽动作识别的平均准确率提高了20%。
Abstract: Dysphagia or swallowing dysfunction seriously affects human health, especially for the elderly. At present, the existing swallowing recognition models have problems such as insufficient accuracy and redundant model parameters. To address the aforementioned issues, this study proposes using a self-developed flexible piezoelectric polylactic acid sensor as the data acquisition terminal and employing a Short-Time Dynamic Convolution (STDyConv) method to analyze physiological signals collected during swallowing. This method consists of a short-time frequency adaptive convolution module and a globally propagating convolution kernel. The parameters of the short-time adaptive convolution module are stored in two tensors, representing the real and imaginary parts of the frequency features, respectively. Through parameter fine-tuning and optimization based on the input physiological signals, the convolution kernel can accurately respond to the multidimensional frequency features of the input signal. To further enhance the model’s scene adaptability, a frequency weighting factor is introduced into each convolution kernel to achieve differentiated attention to frequency-specific features. The globally propagating convolution kernel only needs to store one initial convolution kernel, and all other convolution kernels are generated based on this initial kernel, effectively suppressing the redundant growth of parameters in the dynamic convolution model. Experimental results show that STDyConv has good compatibility with traditional convolutional neural network (CNN) architectures, and compared with traditional methods, the average accuracy of swallowing action recognition is improved by 20%.
文章引用:周子森, 杨建. 基于短时动态卷积的吞咽动作识别研究[J]. 计算机科学与应用, 2026, 16(4): 428-438. https://doi.org/10.12677/csa.2026.164142

参考文献

[1] 孙平秀. 科普小课堂: “安全吞咽, 守护健康” [C]//2025年“《健康大湾区》-科普引领健康”论坛暨第6期健康科普作品征集活动. 2025: 1153-1155.
[2] Hennessy, M. and Goldenberg, D. (2016) Surgical Anatomy and Physiology of Swallowing. Operative Techniques in OtolaryngologyHead and Neck Surgery, 27, 60-66. [Google Scholar] [CrossRef
[3] Shaker, R., Belafsky, P.C., Postma, G.N., et al. (2013) Principles of Deglutition: A Multidisciplinary Text for Swallowing and Its Disorders. Vol. 19, Springer.
[4] Ono, T., Hori, K., Masuda, Y. and Hayashi, T. (2009) Recent Advances in Sensing Oropharyngeal Swallowing Function in Japan. Sensors, 10, 176-202. [Google Scholar] [CrossRef] [PubMed]
[5] 祝文月. 神经损伤引发吞咽困难与误吸的预防攻略[J]. 科技视界, 2025, 15(31): 18-19.
[6] 肖金霞. 吞咽障碍不可忽视, 营养管理与功能训练正当时[J]. 健康必读, 2025(28): 52-53.
[7] Boaden, E., Nightingale, J., Bradbury, C., Hives, L. and Georgiou, R. (2020) Clinical Practice Guidelines for Videofluoroscopic Swallowing Studies: A Systematic Review. Radiography, 26, 154-162. [Google Scholar] [CrossRef] [PubMed]
[8] Prikladnicki, A., Santana, M.G. and Cardoso, M.C. (2022) Protocols and Assessment Procedures in Fiberoptic Endoscopic Evaluation of Swallowing: An Updated Systematic Review. Brazilian Journal of Otorhinolaryngology, 88, 445-470. [Google Scholar] [CrossRef] [PubMed]
[9] Natta, L., Guido, F., Algieri, L., Mastronardi, V.M., Rizzi, F., Scarpa, E., et al. (2021) Conformable ALN Piezoelectric Sensors as a Non-Invasive Approach for Swallowing Disorder Assessment. ACS Sensors, 6, 1761-1769. [Google Scholar] [CrossRef] [PubMed]
[10] Lee, J., Chee, P., Lim, E. and Tan, C. (2021) Artificial Intelligence-Assisted Throat Sensor Using Ionic Polymer-Metal Composite (IPMC) Material. Polymers, 13, Article No. 3041. [Google Scholar] [CrossRef] [PubMed]
[11] Nguyen, D.T., Cohen, E., Pourhomayoun, M. and Alshurafa, N. (2017) Swallownet: Recurrent Neural Network Detects and Characterizes Eating Patterns. 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), Kona, 13-17 March 2017, 401-406. [Google Scholar] [CrossRef
[12] Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
[13] 曹静. 科学护理, 让脑卒中患者吞咽无忧[J]. 医食参考, 2025(22): 55.
[14] 韩思园, 王峥, 秦雯, 等. 画说老年患者吞咽功能康复锻炼[J]. 康复, 2025(33): 70-71.
[15] Verelst, T. and Tuytelaars, T. (2020) Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 2320-2329. [Google Scholar] [CrossRef
[16] Yang, B., Bender, G., Le, Q.V., et al. (2019) CondConv: Conditionally Parameterized Convolutions for Efficient Inference. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, 8-14 December 2019, 1307-1318.
[17] Li, C., Zhou, A. and Yao, A. (2022) Omni-Dimensional Dynamic Convolution. ICLR 2022, 25-29 April 2022.
https://openreview.net/forum?id=DmpCfq6Mg39
[18] 刘璐瑶, 张森, 肖文栋. 基于小波分析和自相关计算的非接触式生理信号检测[J]. 工程科学学报, 2021, 43(9): 1206-1214.
[19] Chollet, F. (2017) Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 1800-1807. [Google Scholar] [CrossRef
[20] He, Z., Liu, Y., Babichuk, I.S., Zhou, Z., Liu, Z., Yang, X., et al. (2024) Flexible Piezoelectric Sensors Based on Ionic Liquid‐Doped Poly(l‐lactic Acid) for Human Coughing Recognition. Advanced Materials Technologies, 9, Article ID: 2400386. [Google Scholar] [CrossRef
[21] Bagnall, A., Dau, H.A., Lines, J., et al. (2018) The UEA Multivariate Time Series Classification Archive, 2018. [Google Scholar] [CrossRef
[22] Shokoohi-Yekta, M., Hu, B., Jin, H., Wang, J. and Keogh, E. (2017) Generalizing DTW to the Multi-Dimensional Case Requires an Adaptive Approach. Data Mining and Knowledge Discovery, 31, 1-31. [Google Scholar] [CrossRef] [PubMed]
[23] Takahashi, K., Yamamoto, K., Kuchiba, A. and Koyama, T. (2022) Confidence Interval for Micro-Averaged F1 and Macro-Averaged F1 Scores. Applied Intelligence, 52, 4961-4972. [Google Scholar] [CrossRef] [PubMed]
[24] Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L. and Muller, P. (2019) Deep Learning for Time Series Classification: A Review. Data Mining and Knowledge Discovery, 33, 917-963. [Google Scholar] [CrossRef
[25] Wang, Z., Yan, W. and Oates, T. (2017) Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline. 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, 14-19 May 2017, 1578-1585. [Google Scholar] [CrossRef
[26] Wu, H., Hu, T., Liu, Y., et al. (2022) TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. [Google Scholar] [CrossRef
[27] Nie, Y., Nguyen, N.H., Sinthong, P., et al. (2022) A Time Series Is Worth 64 Words: Long-Term Forecasting with Transformers. ICLR 2023, Kigali, 1-5 May 2023.
https://openreview.net/forum?id=Jbdc0vTOcol
[28] Oguiza, I. (2023) Tsai—A State-of-the-Art Deep Learning Library for Time Series and Sequential Data.
https://github.com/timeseriesAI/tsai