基于FP-Tree算法的汉语复句关系词依存关系规则的自动挖掘
Automatic Mining of the Dependency Relation Rule of Relational Word in Chinese Compound Sentences Based on FP-Tree Algorithm
DOI: 10.12677/CSA.2021.115158, PDF,   
作者: 涂馨丹:武汉设计工程学院,湖北 武汉
关键词: 关系词依存关系规则挖掘FP-TreeRelational Words Dependency Relation Rule Mining FP-Tree
摘要: 目前关系词识别规则库中共有规则734条,主要是基于字面特征的规则,仍需补充基于依存关系的规则。本文在依存语法的基础上,运用挖掘频繁项集的FP-tree算法对复句中依存规则进行自动挖掘。首先对语料进行预处理,为避免每次重复扫描数据库,先根据关系词对复句进行分类;同时排除数据集过小的分类结果,以保证挖掘规则的质量;然后利用特征分析器分析预处理后的语料,并对分析结果进行形式化表示得到复句的依存特征集合;接着用FP-tree算法对实验语料进行规则挖掘,共挖掘规则84条。实验结果表明,FP-tree算法对依存规则进行自动挖掘的可行性和有效性。
Abstract: The relation word recognition rule base has 734 rules, which are mainly based on the characteristics of literal, and the rules based on dependencies still need to supplement. On the basis of dependency syntax, this paper uses the FP-tree algorithm of mining frequent item sets to automatically mine the dependency rules in complex sentences. First of all, the language material is preprocessed, in order to avoid each repeated scan of the database, first according to the relationship word to classify the complex sentences, at the same time, the small classification results of data sets are excluded to ensure the quality of mining rules, then, the preprocessed language material is analyzed by the feature analyzer, and the analysis results are formalized to represent the set of dependent features of the complex sentence, then, mining the experimental material by FP-tree algorithm, and a total of 84 rules are mined. The experimental results show that this algorithm is feasible and effective in automatic mining dependency rule.
文章引用:涂馨丹. 基于FP-Tree算法的汉语复句关系词依存关系规则的自动挖掘[J]. 计算机科学与应用, 2021, 11(5): 1538-1547. https://doi.org/10.12677/CSA.2021.115158

参考文献

[1] 邢福义. 汉语复句研究[M]. 北京: 商务印书馆, 2003.
[2] 姚双云. 复句关系标记的搭配研究[M]. 武汉: 华中师范大学出版社, 2008.
[3] 杨进才, 涂馨丹, 胡金柱, 等. 基于依存关系规则的汉语复句关系词自动识别[J]. 计算机应用研究, 2018, 35(6): 1756-1760.
[4] Houtsma, M. and Swami, A. (1995) Set-Oriented Mining for Associa-tion Rules in Relational Databases. Proceedings of the 11th IEEE International Conference on Data Engineering, Taipei, 6-10 March 1995, 25-34. [Google Scholar] [CrossRef
[5] Ganter, B. and Wille, R. (1999) Formal Concept Analysis: Math-ematical Foundations. Springer, Berlin, 131-139. [Google Scholar] [CrossRef
[6] 况莉莉. Apriori算法和FP-tree算法的探讨[J]. 淮北煤炭师范学院学报, 2010, 31(2): 44-49.
[7] 马丽生, 姚光顺, 杨传健. 基于改进FP-tree的最大频繁项目集挖掘算法[J]. 计算机应用, 2012, 32(2): 326-329. [Google Scholar] [CrossRef
[8] 王中华. 汉语复句关系词自动标识中规则自动生成方法研究[D]: [硕士学位论文]. 武汉: 华中师范大学, 2013.
[9] 赵鹏. 海量高维数据下的频繁项目集挖掘算法研究[J]. 计算机应用与软件, 2012, 29(7): 150-153.
[10] 纪勇. 基于频繁模式的KPI异常检测研究[J]. 无线互联科技, 2016(15): 115-118.
[11] 袁文群. 基于子图关联规则的链接预测研究[D]: [硕士学位论文]. 重庆: 重庆大学, 2012.
[12] Agrawal. R, Imielinski. T, Swami. A. (1993) Mining Associations between Sets of Items in Massive Data-bases. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington DC, June 1993, 207-216. [Google Scholar] [CrossRef
[13] Pei, J., Han, J., Mortazavi-Asl, B., et al. (2001) PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. Proceedings 17th International Conference on Data Engineering, Heidelberg, 2-6 April 2001, 215-224.
[14] Marcus, M., Santorini, B., et al. (1993) Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19, 313-330. [Google Scholar] [CrossRef