结合标注的中文地址匹配规则链模型
Chinese Address Matching Rule Chain Model Combined with Annotation
摘要: 现有的中文地址匹配研究方法集中于对文本特征的研究,忽略了中文地址所包含的建筑特征、地理位置特征、统计特征和行业特征的数据,此类特征数据可以有效辅助中文地址的匹配研究。本文主要面向非规范的中文地址,以燃气行业居民用户数据为样本数据进行实验,通过分析两个数据源中用户信息的多个特征数据,提出以结合标注的中文地址匹配规则链模型。规则链的优点是链内的规则可以动态配置,通过人工和计算机结合的方式,动态管理规则,多次迭代,逐步提升匹配率。实验结果表明该模型可以一定程度提高中文地址匹配的成功率。
Abstract: Existing Chinese address matching research methods focus on text features, ignoring the data of architectural, geographic, statistical and industrial characteristics contained in Chinese addresses, which can effectively assist Chinese address matching research. This paper mainly aims at non-standard Chinese addresses, and takes gas industry resident user data as sample data to experiment. By analyzing multiple feature data of user information in two data sources, a Chinese address matching rule chain model is proposed to combine labeling. The advantage of rule chains is that the rules in the chain can be configured dynamically. By combining manual and computer methods, rules can be managed dynamically and iterated several times to gradually increase the matching rate. The experimental results show that the model can improve the success rate of Chinese address matching to a certain extent.
文章引用:李晓晰, 张伟. 结合标注的中文地址匹配规则链模型[J]. 计算机科学与应用, 2021, 11(9): 2302-2314. https://doi.org/10.12677/CSA.2021.119235

参考文献

[1] 邓斌, 陈会平, 李凯勇. 基于元数据关联特征的交互式数据快速查询[J]. 计算机仿真, 2021, 38(7): 371-375.
[2] 黄承慧, 印鉴, 侯昉. 一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J]. 计算机学报, 2011, 34(5): 856-864.
[3] 宋凯丽, 李云岭, 姚露露. 基于条件随机场的分词标注一体化地址解析方法[J]. 测绘地理信息, 2021, 46(S1): 185-187.
[4] 亢孟军, 杜清运, 王明军. 地址树模型的中文地址提取方法[J]. 测绘学报, 2015(1): 99-107.
[5] 李圣文, 凌微, 龚君芳, 周长征. 一种基于熵的文本相似性计算方法[J]. 计算机应用研究, 2016, 33(3): 665-668.
[6] 徐兵, 石少青, 陈超. 基于自然语言的中文地址匹配研究[J]. 电子设计工程, 2020, 28(16): 7-10, 16.