统计学在国内医学领域应用趋势的文本挖掘分析
Text Mining Analysis of Trends in the Application of Statistical Methods in China’s Medical Field
DOI: 10.12677/aam.2025.1411471, PDF,   
作者: 张 文, 邹晨晨*:青岛大学数学与统计学院,山东 青岛
关键词: 文本挖掘统计学医学LDA主题模型预测Text Mining Statistics Medicine LDA Topic Modeling Prediction
摘要: 本文基于CNKI医学博士学位论文,分析国内医学领域常用统计学方法的应用趋势,并结合LDA与ARIMA模型进行主题挖掘与预测。结果显示,ROC曲线分析、Logistic模型、机器学习等方法快速增长,t检验、方差分析等波动下降。LDA提取出模型评估指标、描述性统计、系统性分析、试验设计、统计检验、生存分析、回归分析、多组比较及机器学习九大主题;预测显示系统性分析、统计检验、回归分析及机器学习将持续高热度,描述性统计、多组比较下降。总体呈现“基础稳固、智能与多因素分析加速发展”特征,为医学研究统计学方法选择与规范化应用提供参考。
Abstract: Based on an analysis of medical doctoral dissertations from CNKI, this study examines trends in the application of statistical methods in China’s medical field by integrating LDA and ARIMA models for topic extraction and forecasting. The results show rapid growth in ROC curve analysis, logistic models, and machine learning, while traditional methods such as t-tests and ANOVA exhibit fluctuating declines. LDA identifies nine major topics: model evaluation metrics, descriptive statistics, systematic analysis, experimental design, statistical tests, survival analysis, regression analysis, multiple-group comparisons, and machine learning. Forecasts indicate that systematic analysis, statistical tests, regression analysis, and machine learning will remain highly active, whereas descriptive statistics and multiple-group comparisons are likely to decline. Overall, the field demonstrates “solid foundational methods alongside accelerating development in intelligent and multifactor analyses,” providing valuable guidance for the selection and standardized application of statistical methods in medical research.
文章引用:张文, 邹晨晨. 统计学在国内医学领域应用趋势的文本挖掘分析[J]. 应用数学进展, 2025, 14(11): 146-154. https://doi.org/10.12677/aam.2025.1411471

参考文献

[1] Sarma, K.V.S., Mohan, A. and Vedururu, S.S. (2022) Statistical Methods in Clinical Studies: An Overview. Journal of Clinical and Scientific Research, 11, 34-39. [Google Scholar] [CrossRef
[2] Röhrig, B., Prel, J.D., Wachtlin, D. and Blettner, M. (2009) Types of Study in Medical Research: Part 3 of a Series on Evaluation of Scientific Publications. Deutsches Ärzteblatt International, 106, 262-268. [Google Scholar] [CrossRef] [PubMed]
[3] Whitley, E. and Ball, J. (2002) Statistics Review 1: Presenting and Summarising Data. Critical Care, 6, 66-71. [Google Scholar] [CrossRef] [PubMed]
[4] Pocock, S.J., McMurray, J.J.V. and Collier, T.J. (2015) Making Sense of Statistics in Clinical Trial Reports: Part 1 of a 4-Part Series on Statistics for Clinical Trials. Journal of the American College of Cardiology, 66, 2536-2549. [Google Scholar] [CrossRef] [PubMed]
[5] Guo, B. and Zhang, R. (2018) Statistical Methods for Clinical Trial Designs in the New Era of Cancer Treatment. Biostatistics and Biometrics Open Access Journal, 5, Article ID: 555665. [Google Scholar] [CrossRef
[6] Feng, Y., Wang, A.Y., Jun, M., Pu, L., Weisbord, S.D., Bellomo, R., et al. (2023) Characterization of Risk Prediction Models for Acute Kidney Injury: A Systematic Review and Meta-Analysis. JAMA Network Open, 6, e2313359. [Google Scholar] [CrossRef] [PubMed]
[7] Henley, S.S., Golden, R.M. and Kashner, T.M. (2019) Statistical Modeling Methods: Challenges and Strategies. Biostatistics & Epidemiology, 4, 105-139. [Google Scholar] [CrossRef
[8] 王敏, 解智鹏, 王心怡, 等. 临床研究中统计学方法的应用趋势分析——以四大医学期刊为例[J]. 中国卫生统计, 2025, 42(2): 244-247.
[9] 袁军鹏, 朱东华, 李毅, 等. 文本挖掘技术研究进展[J]. 计算机应用研究, 2006(2): 1-4.
[10] 戚云霞. 中文文本挖掘技术的研究与应用[D]: [硕士学位论文]. 西安: 西安电子科技大学, 2014.
[11] 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012: 391.
[12] Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.
[13] 邰杨芳, 陈扬嫒, 郭樱, 等. 基于潜在狄利克雷分布模型的我国临床试验管理研究热点及其演化分析[J]. 现代预防医学, 2022, 49(9): 1712-1719.
[14] 毕秋颖. 半监督相关医学文献的文本主题聚类分析[D]: [硕士学位论文]. 兰州: 兰州大学, 2023.
[15] 杨启帆. 基于主题模型与文献计量的埃博拉病毒文献研究[D]: [硕士学位论文]. 北京: 军事科学院, 2024.
[16] 岳丽欣, 周晓英, 陈旖旎. 基于ARIMA模型的信息构建研究主题趋势预测研究[J]. 图书情报知识, 2019(5): 54-63+72.