基于实体识别的纺织技术主题内容演化研究
Research on the Evolution of Textile Technology Theme Content Based on Entity Recognition
摘要: 专利文本是技术创新的核心构建要素,对文本内容进行主题分析有助于厘清技术主题分布及演变趋势。以2018~2022年间知网纺织面料制备技术专利为研究对象,利用命名实体识别进行研究,以提取物体类实体作为专利文本内容分析的依据,按年划分时间窗口,使用困惑度–主体方差得到最优主题数。通过分析技术主题内容演变过程总结得到纺织面料制备的创新模式。通过分析主题内容演变过程,将其归纳为面料原料、面料制备工艺和面料特性三组技术元素,给出进一步面料制备的开发建议。为了克服主题建模中难以准确快速地选定词簇表示主题的难题,利用命名实体识别技术简化技术术语抽取工作,使用ERNIE3.0知识增强预训练模型快速得到具备强概括能力的技术术语集合。
Abstract: Patent text is the core building element of technological innovation, and thematic analysis of text content is helpful to clarify the distribution and evolution trend of technological themes. Taking CNKI textile fabric preparation technology patents from 2018 to 2022 as the research object, Named-entity recognition was used for research, and object-like entities were extracted as the basis for patent text content analysis. Time windows were divided by year, and the optimal number of topics was obtained using Perplexity subject variance. Summarize the innovative mode of textile fabric preparation by analyzing the evolution process of technical theme content. By analyzing the evolution process of the theme content, it is summarized into three technical elements: fabric raw materials, fabric preparation process, and fabric characteristics, and further development suggestions for fabric preparation are provided. In order to overcome the difficulty of accurately and quickly selecting word clusters to represent topics in topic modeling, Named-entity recognition technology is used to simplify the extraction of technical terms, and ERNIE3.0 knowledge enhancement pre-training model is used to quickly obtain technical term sets with strong generalization ability.
文章引用:胡莹, 董平军. 基于实体识别的纺织技术主题内容演化研究[J]. 管理科学与工程, 2025, 14(1): 46-52. https://doi.org/10.12677/mse.2025.141006

参考文献

[1] 黄晓斌, 吴高. 学科领域研究前沿探测方法研究述评[J]. 情报学报, 2019, 38(8): 872-880.
[2] 邱科达, 马建玲. 机器学习在术语抽取研究中的文献计量分析[J]. 图书情报工作, 2020, 64(14): 94-103.
[3] 靳嘉林, 王曰芬, 巴志超, 等. 基金项目研究的主题挖掘与动态演化分析——以美国NSF数据中AI领域为例[J]. 情报学报, 2022, 41(9): 967-979.
[4] 邢晓昭, 任亮, 雷孝平, 等. 基于专利主题演化的颠覆性技术识别研究——以类脑智能领域为例[J]. 情报科学, 2023, 41(3): 81-88.
[5] 高佳奕, 杨涛, 董海艳, 史话跃, 胡孔法. 基于LSTM-CRF的中医医案症状命名实体抽取研究[J]. 中国中医药信息杂志, 2021, 28(5): 20-24.
[6] 李建, 靖富营, 刘军. 基于改进BERT算法的专利实体抽取研究——以石墨烯为例[J]. 电子科技大学学报, 2020, 49(6): 883-890.
[7] 傅源坤, 柳先辉, 赵卫东. 基于BERT的智能制造装备命名实体识别方法[J]. 制造业自动化, 2022, 44(9): 120-124.
[8] 杨佳鑫, 杜军平, 邵蓥侠, 李昂, 奚军庆. 面向知识产权的科技资源画像构建方法[J]. 软件学报, 2022, 33(4): 1439-1450.
[9] 罗艺雄, 吕学强, 游新冬. 融合多特征的专利功效短语识别[J]. 中文信息学报, 2022, 36(12): 139-148.
[10] Wei, X. and Croft, W.B. (2006) LDA-Based Document Models for Ad-Hoc Retrieval. SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, 6-11 August 2006, 178-185.
[11] 杨洋, 江开忠, 原明君, 等. 新闻话题识别中LDA最优主题数选取研究[J]. 数据分析与知识发现, 2022, 6(11): 72-78.
[12] 单晓红, 韩晟熙, 刘晓燕. 基于技术主题演化的颠覆性技术识别研究[J]. 情报理论与实践, 2023, 46(8): 113-123.
[13] 关鹏, 王曰芬. 科技情报分析中LDA主题模型最优主题数确定方法研究[J]. 现代图书情报技术, 2016(9): 42-50.
[14] Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.