基于BERTopic的河南暴雨灾害微博舆情时空演化与主题结构研究

doi:10.12677/hjdm.2026.162002

期刊菜单

基于BERTopic的河南暴雨灾害微博舆情时空演化与主题结构研究
Spatiotemporal Evolution and Topic Structure of Weibo Public Opinion on the Henan Rainstorm Disaster Based on BERTopic

DOI: 10.12677/hjdm.2026.162002, PDF, 科研立项经费支持
作者: 王旭锐, 叶妍君^*, 王宇晗：河北工程大学地球科学与工程学院，河北邯郸
关键词: 暴雨灾害；微博舆情；主题模型；BERTopic；时空演化；风险沟通；应急治理；Rainstorm Disaster； Weibo Public Opinion； Topic Modeling； BERTopic； Spatiotemporal Evolution； Risk Communication； Emergency Governance

摘要: 为刻画重大暴雨灾害事件中公众信息需求与情绪表达的时空演化规律，文章以河南暴雨灾害相关微博为研究对象，构建“文本主题–时间演化–空间分布”一体化分析框架。基于网络爬虫获取微博文本数据13,509条，并对其中可定位至河南省内的5654条样本进行地理标准化处理；在此基础上，采用BERTopic模型完成语义嵌入、降维聚类与主题表示学习，提取灾害舆情主题并刻画其跨时间阶段的强度变化与空间分异特征。结果表明：1) 灾害舆情呈现显著的核心–边缘主题结构，以“应急救援与灾情进展”为核心主题，并与“捐赠互助、公共服务、次生灾害与预警、责任讨论与反思”等外围主题形成同心层级关系；2) 主题热度随灾害过程呈阶段性波动，整体表现为“突发爆发–持续发酵–回落沉淀”的生命周期特征，不同主题在爆发期与恢复期的主导性存在差异；3) 空间上，舆情强度在省内呈现以郑州为核心的集聚与外扩格局，并表现出随距离增加而减弱的扩散特征。研究可为暴雨灾害情景下的风险沟通策略、信息发布节奏优化与跨区域协同治理提供数据支撑与决策参考。

Abstract: To characterize the spatiotemporal dynamics of public information needs and sentiment expressions during a severe rainstorm disaster, this study takes Weibo posts related to the Henan rainstorm disaster as its research subject and constructs an integrated analytical framework of “textual themes-temporal evolution-spatial distribution”. A total of 13,509 Weibo posts were collected via web crawling, among which 5654 geotagged posts within Henan Province were standardized for spatial analysis. On this basis, the BERTopic was employed to perform semantic embedding, dimensionality reduction, clustering, and topic representation learning, enabling the extraction of disaster-related topics and the quantification of topic intensity across time stages and locations. The results show that: 1) disaster public opinion exhibits a clear core-periphery structure, with “emergency rescue and situation updates” as the core topic, surrounded by peripheral topics such as “donations and mutual aid, public services, secondary hazards and warnings, and responsibility discussions”; 2) topic intensity varies by disaster phases, following a lifecycle pattern of “outbreak, sustained fermentation, post-event decline”, with shifts in dominant topics between the outbreak and recovery stages; and 3) spatially, public opinion intensity within the province presents an agglomeration and outward expansion pattern centered on Zhengzhou, exhibiting a distance decay diffusion feature that weakens with increasing distance. These findings provide data support and decision-making references for risk communication strategies, optimize information release rhythms, and cross-regional collaborative emergency governance under rainstorm disaster scenarios.

文章引用：王旭锐, 叶妍君, 王宇晗. 基于BERTopic的河南暴雨灾害微博舆情时空演化与主题结构研究[J]. 数据挖掘, 2026, 16(2): 11-21. https://doi.org/10.12677/hjdm.2026.162002

参考文献

[1]	Wang, W., Zhu, X., Lu, P., Zhao, Y., Chen, Y. and Zhang, S. (2024) Spatio-Temporal Evolution of Public Opinion on Urban Flooding: Case Study of the 7.20 Henan Extreme Flood Event. International Journal of Disaster Risk Reduction, 100, Article ID: 104175. [Google Scholar] [CrossRef]
[2]	Zhang, P., Zhang, H. and Kong, F. (2024) Research on Online Public Opinion in the Investigation of the “7-20” Extraordinary Rainstorm and Flooding Disaster in Zhengzhou, China. International Journal of Disaster Risk Reduction, 105, Article ID: 104422. [Google Scholar] [CrossRef]
[3]	李燕凌, 伍可欣. 突发事件网络舆情主题演化聚焦特征研究——以“河南暴雨事件”为例[J]. 灾害学, 2025, 40(1): 207-212. http://dx.chinadoi.cn/10.3969/j.issn.1000-811X.2025.01.032, 2025-01-19. [Google Scholar] [CrossRef]
[4]	陈兴蜀, 常天祐, 王海舟, 等. 基于微博数据的“新冠肺炎疫情”舆情演化时空分析[J]. 四川大学学报(自然科学版), 2020, 57(2): 409-416.
[5]	张琛, 马祥元, 周扬, 等. 基于用户情感变化的新冠疫情舆情演变分析[J]. 地球信息科学学报, 2021, 23(2): 341-350.
[6]	薛倩, 赵宏, 任福兵. 美国科技智库电子信息领域报告主题挖掘与演变研究[J]. 农业图书情报学报, 2025, 37(10): 78-95.
[7]	Devlin, J., Chang, M.W., Lee, K. and Toutanova, K. (2019) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 4171-4186.
[8]	Grootendorst, M. (2022) BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure.
[9]	Vallelunga, R., Scarpino, I., Martinis, M.C., Luzza, F. and Zucco, C. (2024) Applications of Text Mining Techniques to Extract Meaningful Information from Gastroenterology Medical Reports. Journal of Computational Science, 83, Article ID: 102458. [Google Scholar] [CrossRef]
[10]	Veigel, N., Kreibich, H., de Bruijn, J.A., Aerts, J.C.J.H. and Cominola, A. (2025) Content Analysis of Multi-Annual Time Series of Flood-Related Twitter (X) Data. Natural Hazards and Earth System Sciences, 25, 879-891. [Google Scholar] [CrossRef]
[11]	张岩, 李英冰, 郑翔. 基于微博数据的台风“山竹”舆情演化时空分析[J]. 山东大学学报(工学版), 2020, 50(5): 118-126. http://dx.chinadoi.cn/10.6040/j.issn.1672-3961.0.2019.371 [Google Scholar] [CrossRef]
[12]	Zhou, Y., Xu, J., Yin, M., Zeng, J., Ming, H. and Wang, Y. (2022) Spatial-Temporal Pattern Evolution of Public Sentiment Responses to the COVID-19 Pandemic in Small Cities of China: A Case Study Based on Social Media Data Analysis. International Journal of Environmental Research and Public Health, 19, Article No. 11306. [Google Scholar] [CrossRef] [PubMed]
[13]	马莹雪, 赵吉昌. 自然灾害期间微博平台的舆情特征及演变——以台风和暴雨数据为例[J]. 数据分析与知识发现, 2021, 5(6): 66-79. http://dx.chinadoi.cn/10.11925/infotech.2096-3467.2020.1258 [Google Scholar] [CrossRef]
[14]	Li, C., Wang, X. and Ye, Y. (2025) Spatiotemporal Distribution of Cross-Platform Public Opinion in the 2023 Dezhou Earthquake: Implications for Disaster-Resilient Emergency Management. Sustainability, 17, Article No. 10937. [Google Scholar] [CrossRef]
[15]	Ma, D., Zhang, C., Zhao, L., Huang, Q. and Liu, B. (2023) An Analysis of the Evolution of Public Sentiment and Spatio-Temporal Dynamics Regarding Building Collapse Accidents Based on Sina Weibo Data. ISPRS International Journal of Geo-Information, 12, Article No. 388. [Google Scholar] [CrossRef]
[16]	徐迪. 基于空间可视化的大数据舆情研判体系建构研究[J]. 情报科学, 2019, 37(3): 22-26.
[17]	Wang, C., Ye, Y., Qiu, Y., Li, C. and Du, M. (2024) Evolution and Spatiotemporal Analysis of Earthquake Public Opinion Based on Social Media Data. Earthquake Science, 37, 387-406. [Google Scholar] [CrossRef]
[18]	Yang, Y., Zhang, Y., Zhang, X., Cao, Y. and Zhang, J. (2022) Spatial Evolution Patterns of Public Panic on Chinese Social Networks Amidst the COVID-19 Pandemic. International Journal of Disaster Risk Reduction, 70, Article ID: 102762. [Google Scholar] [CrossRef] [PubMed]
[19]	Reimers, N. and Gurevych, I. (2019) Sentence-Bert: Sentence Embeddings Using Siamese Bert-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, November 2019, 3982-3992. [Google Scholar] [CrossRef]
[20]	McInnes, L., Healy, J. and Melville, J. (2018) Umap: Uniform Manifold Approximation and Projection for Dimension Reduction.
[21]	McInnes, L. and Healy, J. (2017) Accelerated Hierarchical Density Based Clustering. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, 18-21 November 2017, 33-42. [Google Scholar] [CrossRef]
[22]	Carbonell, J. and Goldstein, J. (1998) The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, 24-28 August 1998, 335-336. [Google Scholar] [CrossRef]
[23]	Zhang, T. and Cheng, C. (2021) Temporal and Spatial Evolution and Influencing Factors of Public Sentiment in Natural Disasters—A Case Study of Typhoon Haiyan. ISPRS International Journal of Geo-Information, 10, Article No. 299. [Google Scholar] [CrossRef]

为你推荐

友情链接