基于专利文本挖掘的健康建筑技术主题识别分析
Topic Identification and Analysis of Healthy Building Technology Based on Patent Text Mining
DOI: 10.12677/sa.2026.154068, PDF,   
作者: 李 浩:同济大学经济与管理学院,上海
关键词: 健康建筑专利分析文本挖掘LDAHealthy Building Patent Analysis Text Mining LDA
摘要: 目的/意义:随着“健康中国”战略和建筑高质量发展要求的不断推进,健康建筑已成为建筑领域的重要研究方向。通过对专利摘要文本进行文本挖掘,能够系统地识别健康建筑领域的技术主题,为中国健康建筑技术布局与产业发展提供数据支撑和方法参考。方法/过程:首先,基于《健康建筑评价标准》(T/ASC 02-2021)构建健康建筑专利检索体系,通过国家知识产权局专利检索与分析系统获取专利数据,并采用Jieba分词、停用词表与词典优化方法对筛选后的专利数据进行摘要文本的预处理;其次,在人工标注训练集的基础上构建多分类模型,实现健康建筑技术类别的自动识别;最后,运用LDA主题模型,对健康建筑专利数据进行分类别的技术主题识别,揭示中国健康建筑领域的技术主题分布情况。结果/结论:研究结果表明:(1) 空气与热环境相关技术在健康建筑专利中占据主导地位,反映出建筑的空气质量与热舒适是人们对健康建筑的核心关注点;(2) 监测技术在各类别中广泛存在,体现出针对建筑环境进行持续监测是实现健康建筑的重要技术路径。
Abstract: Objective/Significance: With the continuous advancement of the “Healthy China” strategy and the requirements for high-quality development in the construction industry, healthy buildings have become an important research direction in the field of architecture. By conducting text mining on patent abstracts, it is possible to systematically identify the technical themes in the field of healthy buildings, providing data support and methodological references for the technological layout and industrial development of healthy buildings in China. Method/Process: Firstly, a patent search system for healthy buildings was constructed based on the “Standard for Evaluation of Healthy Buildings” (T/ASC 02-2021), and patent data was obtained through the National Intellectual Property Administration’s patent search and analysis system. The abstracts of the selected patent data were preprocessed using Jieba word segmentation, stopword lists, and dictionary optimization methods. Secondly, a multi-classification model was built based on a manually labeled training set to automatically identify the technical categories of healthy buildings. Finally, the LDA topic model was applied to classify and identify the technical themes of healthy building patents, revealing the distribution of technical themes in the field of healthy buildings in China. Result/Conclusion: The research results show that: (1) Technologies related to air and thermal environment dominate in healthy building patents, indicating that air quality and thermal comfort in buildings are the core concerns of people regarding healthy buildings; (2) Monitoring technologies are widely present in various categories, demonstrating that continuous monitoring of the building environment is an important technical path to achieving healthy buildings.
文章引用:李浩. 基于专利文本挖掘的健康建筑技术主题识别分析[J]. 统计学与应用, 2026, 15(4): 24-39. https://doi.org/10.12677/sa.2026.154068

参考文献

[1] 中国建筑学会. 健康建筑评价标准: T/ASC 02-2021 [S]. 北京: 中国建筑工业出版社, 2021.
[2] 杨铁军. 专利信息利用导引[M]. 北京: 知识产权出版社, 2011.
[3] 万校基, 李海林, 何雨晴, 等. 热度演化视角下新兴主题识别分析研究[J]. 图书情报工作, 2024, 68(22): 126-138.
[4] 许佳琪, 汪雪锋, 陈虹枢, 等. 跨领域颠覆性技术主题识别研究: 以脑科学技术为例[J]. 图书情报工作, 2024, 68(15): 44-57.
[5] Kleminski, R., Kazienko, P. and Kajdanowicz, T. (2020) Analysis of Direct Citation, Co-Citation and Bibliographic Coupling in Scientific Topic Identification. Journal of Information Science, 48, 349-373. [Google Scholar] [CrossRef
[6] 柴文越, 刘小平, 梁爽. 新兴主题识别方法研究综述[J]. 现代情报, 2023, 43(12): 164-177.
[7] Wang, X., He, J., Huang, H. and Wang, H. (2022) Matrixsim: A New Method for Detecting the Evolution Paths of Research Topics. Journal of Informetrics, 16, Article ID: 101343. [Google Scholar] [CrossRef
[8] Suominen, A., Toivanen, H. and Seppänen, M. (2017) Firms’ Knowledge Profiles: Mapping Patent Data with Unsupervised Learning. Technological Forecasting and Social Change, 115, 131-142. [Google Scholar] [CrossRef
[9] 王晨, 廖启明. 基于改进的LDA模型的文献主题挖掘与演化趋势研究——以个人隐私信息保护领域为例[J]. 情报科学, 2023, 41(10): 112-120.
[10] Ma, J., Wang, L., Zhang, Y., Yuan, W. and Guo, W. (2023) An Integrated Latent Dirichlet Allocation and Word2vec Method for Generating the Topic Evolution of Mental Models from Global to Local. Expert Systems with Applications, 212, Article ID: 118695. [Google Scholar] [CrossRef
[11] Rejeb, A., Rejeb, K., Simske, S. and Süle, E. (2025) Industry 5.0 Research: An Approach Using Co-Word Analysis and BERTopic Modeling. Discover Sustainability, 6, Article No. 402. [Google Scholar] [CrossRef
[12] 薛航, 施国良, 陈挺. 基于对比学习的高价值发明专利识别: 以无线通信网络领域为例[J]. 情报杂志, 2024, 43(9): 179-187.
[13] 王桂芳, 何涛, 马廷灿, 等. 基于科技文献的生物核磁领域技术机会识别[J]. 科技管理研究, 2016, 36(10): 142-147.
[14] 杨恒, 王曰芬, 张露. 基于核心专利技术主题识别与演化分析的技术预测[J]. 情报杂志, 2022, 41(7): 49-56.
[15] 国家市场监督管理总局, 国家标准化管理委员会. 科学技术研究项目评价通则: GB/T 22900-2022 [S]. 北京: 中国标准出版社, 2022: 4.
[16] 马铭, 王超, 周勇, 等. 基于语义信息的核心技术主题识别与演化趋势分析方法研究[J]. 情报理论与实践, 2021, 44(9): 106-113.
[17] Devlin, J., Chang, M.W., Lee, K., et al. (2018) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.
[18] Ke, G., Meng, Q., Finley, T., et al. (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems, Long Beach, 4-9 December 2017, 3146-3154.
[19] Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.
[20] 白如江, 刘博文, 冷伏海. 基于多维指标的未来新兴科学研究前沿识别研究[J]. 情报学报, 2020, 39(7): 747-760.
[21] Steyvers, M. and Griffiths, T. (2007) Probabilistic Topic Models. In: Handbook of Latent Semantic Analysis, Routledge, 424-440.
[22] 余厚强, 王玥, 吴婷婷, 等. 基于政策文献计量的我国新时期科技评价体系改革进程研究[J]. 情报科学, 2022, 40(8): 20-28.
[23] Stevens, K., Kegelmeyer, P., Andrzejewski, D., et al. (2012) Exploring Topic Coherence over Many Models and Many Topics. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, July 2012, 952-961.