基于状态转移网络的《红楼梦》文本分析
A Textual Analysis of A Dream of Red Mansions by the State Transfer Network
摘要: 当前多数文本分析是基于非中文语言,中文文本的研究较少。本文对《红楼梦》前80回和后40回分别提取字频时间序列。进一步,通过滑动窗口将每段时间序列划分成时序片段,并映射为状态转移网络来探究文本语言的写作风格及特征,发现《红楼梦》的前80回与后40回在节点度分布、重要循环的连边强度存在显著差异,使用去趋势波动分析发现《红楼梦》前后部分模体位置序列的标度指数也存在明显区别。《红楼梦》前后部分之间的网络特性差异,支持了《红楼梦》为曹雪芹和高鹗合作的普遍观点。
Abstract: At present, most textual analysis is based on non-Chinese languages while Chinese texts are rarely studied. In this paper, character frequency time series of the first 80 and last 40 episodes of A Dream of Red Mansions are extracted respectively. Furthermore, the writing style and characteristics of the text language are explored by dividing each series into time series segments through sliding windows and mapping them into state transition networks. It is found that there are significant differences in degree distribution and edge weight of important cycles between the first part and the second part of A Dream of Red Mansions. Using the method of detrended fluctuation analysis (DFA), it is found that there are obvious differences in the scaling exponents of the first part and the second part. The difference of network characteristics confirms the general viewpoint that A Dream of Red Mansions is a collaboration between Cao Xueqin and Gao E.
文章引用:周晨琳, 师野, 顾长贵. 基于状态转移网络的《红楼梦》文本分析[J]. 应用数学进展, 2022, 11(3): 1376-1388. https://doi.org/10.12677/AAM.2022.113150

参考文献

[1] Steels, L. (2000) Language as a Complex Adaptive System. In: Schoenauer, M., et al., Eds., Parallel Problem Solving from Nature PPSN VI. PPSN 2000. Lecture Notes in Computer Science, Springer, Berlin, 17-26. [Google Scholar] [CrossRef
[2] Solé, R.V., et al. (2010) Language Networks: Their Structure, Function, and Evolution. Complexity, 15, 20-26.
[3] Cong, J. and Liu, H.T. (2014) Approaching Human Language with Complex Networks. Physics of Life Reviews, 11, 598-618. [Google Scholar] [CrossRef] [PubMed]
[4] Li, Y., Wei, L.X., Li, W., Niu, Y. and Luo, S.Y. (2005) Small-World Patterns in Chinese Phrase Networks. Chinese Science Bulletin, 50, 286-288.
[5] Cancho, R.F.I. and Solé, R.V. (2001) The Small World of Human Language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268, 2261-2265. [Google Scholar] [CrossRef] [PubMed]
[6] Masucci, A.P. and Rodgers, G.J. (2006) Network Properties of Written Human Language. Physical Review E, Statistical, Nonlinear, and Soft Matter Physics, 74, Article ID: 026102. [Google Scholar] [CrossRef
[7] Liang, W., et al. (2012) Study on Co-Occurrence Character Networks from Chinese Essays in Different Periods. Science China Information Sciences, 55, 2417-2427. [Google Scholar] [CrossRef
[8] Altmann, E.G., Cristadoro, G. and Esposti, M.D. (2012) On the Origin of Long-Range Correlations in Texts. Proceedings of the National Academy of Sciences of the United States of America, 109, 11582-11587. [Google Scholar] [CrossRef] [PubMed]
[9] Choudhury, M. and Mukherjee, A. (2009) The Structure and Dynamics of Linguistic Networks. In: Ganguly, N., Deutsch, A. and Mukherjee, A., Eds., Dynamics on and of Complex Networks, Modeling and Simulation in Science, Engineering and Technology, Birkhäuser, Boston, 145-166. [Google Scholar] [CrossRef
[10] Montemurro, M.A. and Pury, P.A. (2002) Long-Range Fractal Correlations in Literary Corpora. Fractals, 10, 451-461. [Google Scholar] [CrossRef
[11] Bhan, J., et al. (2005) Long-Range Correlations in Korean Literary Corpora. Chaos, Solitons & Fractals, 29, 69-81. [Google Scholar] [CrossRef
[12] Kulig, A., et al. (2017) In Narrative Texts Punctuation Marks Obey the Same Statistics as Words. Information Sciences, 375, 98-113. [Google Scholar] [CrossRef
[13] Deng, W.B., Wang, D.J., Li, W. and Wang, Q.A. (2011) English and Chinese Language Frequency Time Series Analysis. Chinese Science Bulletin, 56, 3717-3722. [Google Scholar] [CrossRef
[14] Yang, T.G., Gu, C.G. and Yang, H.J. (2017) Long-Range Correlations in Sentence Series from a Story of the Stone. PLoS ONE, 11, e0162423. [Google Scholar] [CrossRef] [PubMed]
[15] 孙龙龙, 顾长贵, 冯靖, 吴果林. 四大名著文本中的无标度规律[J]. 上海理工大学学报, 2019, 41(1): 77-83.
[16] Liang, W., et al. (2009) Comparison of Co-Occurrence Networks of the Chinese and English Languages. Physica A: Statistical Mechanics and Its Applications, 388, 4901-4909. [Google Scholar] [CrossRef
[17] Bi, W. (2014) The Origin and Evolvement of Chinese Characters. Wydawnictwo Uniwersytetu Gdańskiego, Gdańsk.
[18] Zhang, J. and Michael, S. (2006) Complex Network from Pseudoperiodic Time Series: Topology versus Dynamics. Physical Review Letters, 96, Article ID: 238701. [Google Scholar] [CrossRef
[19] Xu, X.K., Zhang, J. and Michael, S. (2008) Superfamily Phenomena and Motifs of Networks Induced from Time Series. Proceedings of the National Academy of Sciences of the United States of America, 105, 19601-19605. [Google Scholar] [CrossRef] [PubMed]
[20] Stephen, M., Gu, C.G. and Yang, H.J. (2015) Visibility Graph Based Time Series Analysis. PLoS ONE, 10, e0143015. [Google Scholar] [CrossRef] [PubMed]
[21] Liu, H.T. (2008) The Complexity of Chinese Syntactic Dependency Networks. Physica A: Statistical Mechanics and Its Applications, 387, 3048-3058. [Google Scholar] [CrossRef
[22] 刘知远, 郑亚斌, 孙茂松. 汉语依存句法网络的复杂网络性质[J]. 复杂系统与复杂性科学, 2008, 5(2): 37-45.
[23] 刘海涛, 黄伟. 计量语言学的现状、理论与方法[J]. 浙江大学学报(人文社会科学版), 2012, 42(2): 178-192.
[24] 周汝昌. 红楼梦新证[M]. 江苏: 译林出版社, 2013: 709-730.