基于等价性检验和特征聚类的《红楼梦》作者分析
An Analysis of the Authors of A Dream of Red Mansions Based on Equivalence Checking and Feature Clustering
摘要:
引入等价性检验模型,选取“红”、“玉”二字统计频数,计算检验统计量U与p值。根据U检验值与概率对照表,初步得出前80章与后40章存在差异,并非一人所著。同时选取K均值聚类与凝聚聚类,根据词频聚类出多种情况。结果表明,《红楼梦》全书使用词频均存在着差异,其作者不止一人。
Abstract:
The equivalence checking model is introduced to calculate the test statistics U and p values by se-lecting the statistical frequency of “red” and “jade”. According to U-tests and probability compara-tive table, differences between the first 80 chapters and the last 40 chapters are preliminarily concluded. Many cases are clustered by word frequency with K-means clustering and agglomerative clustering. The results show that there are differences in word frequency used in A Dream of Red Mansions, and there is more than one author.
参考文献
|
[1]
|
胡适.《红楼梦考证》(改定稿) [M]. 北京: 北京出版社, 2015.
|
|
[2]
|
Karlgren, B. (1952) New Excursions in Chinese Grammar. The Bulletin of the Museum of Far Eastern Antiquities, 24, 79.
|
|
[3]
|
Koppel, M., Schler, J. and Argamon, S. (2009) Computational Methods in Authorship Attribution. Journal of the American Society for Information Science and Technology, 60, 9-26. [Google Scholar] [CrossRef]
|
|
[4]
|
李国强, 李瑞芳. 基于计算机的词频统计研究——考证《红楼梦》作者是否唯一[J]. 沈阳化工学院学报, 2006, 20(4): 305-307.
|
|
[5]
|
施建军. 基于支持向量机技术的《红楼梦》作者研究[J]. 红楼梦学刊, 2011(5): 35-52.
|
|
[6]
|
施建军. 关于以《红楼梦》120回为样本进行其作者聚类分析的可信度问题研究[J]. 红楼梦学刊, 2010(5): 318-335.
|
|
[7]
|
叶雷. 基于计量文体特征聚类的《红楼梦》作者分析[J]. 红楼梦学刊, 2016(5): 312-324.
|