生成式大模型与人类词汇联想比较研究——基于词频效应与联想不对称性
A Comparative Study of Lexical Associations between GenAI and Humans—Based on Word Frequency Effects and Associative Asymmetry
摘要: 本研究从词汇联想范式出发,比较生成式大模型与人类在语义表征结构上的差异,重点考察联想多样性与联想不对称性。结果表明,人类词汇联想在多样性与不对称性上均高于生成式大模型,呈现出更为复杂的语义组织结构;两类数据均表现出词频相关规律,但生成式大模型对词频的依赖更强。此外,联想多样性与不对称性之间的关系在人类数据中更为紧密。总体而言,生成式大模型虽能在一定程度上复现人类词汇联想的统计特征,但其语义表征的结构复杂性仍存在差距。
Abstract: This study compares lexical association patterns between GenAI (Generative Artificial Intelligence) and humans to examine whether GenAI exhibits human-like semantic organization. Focusing on association diversity and asymmetry, the results show that human data demonstrate higher diversity and stronger directional asymmetry than GenAI-generated associations. While both systems display systematic relationships between word frequency, GenAI shows a stronger dependence on frequency. In addition, the relationship between diversity and asymmetry is more tightly coupled in human data. Overall, the findings suggest that GenAI can approximate certain statistical regularities of human lexical associations, but their semantic representations remain less structurally complex than those of the human mental lexicon.
文章引用:关思扬. 生成式大模型与人类词汇联想比较研究——基于词频效应与联想不对称性[J]. 现代语言学, 2026, 14(6): 68-75. https://doi.org/10.12677/ml.2026.146500

参考文献

[1] De Deyne, S., Navarro, D.J., Perfors, A., Brysbaert, M. and Storms, G. (2019) The “Small World of Words” English Word Association Norms for over 12,000 Cue Words. Behavior Research Methods, 51, 987-1006. [Google Scholar] [CrossRef] [PubMed]
[2] Abramski, K., Improta, R., Rossetti, G. and Stella, M. (2025) The “LLM World of Words” English Free Association Norms Generated by Large Language Models. Scientific Data, 12, Article No. 803. [Google Scholar] [CrossRef] [PubMed]
[3] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems, 30, 1-11.
[4] Mikolov, T., Chen, K., Corrado, G.S. and Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. 2013 International Conference on Learning Representations, Scottsdale, 2-4 May 2013.
[5] Ethayarajh, K. (2019) How Contextual Are Contextualized Word Representations? Comparing the Geometry of BERT, Elmo, and GPT-2 Embeddings. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, November 2019, 55-65. [Google Scholar] [CrossRef
[6] Brysbaert, M. and New, B. (2009) Moving beyond Kučera and Francis: A Critical Evaluation of Current Word Frequency Norms and the Introduction of a New and Improved Word Frequency Measure for American English. Behavior Research Methods, 41, 977-990. [Google Scholar] [CrossRef] [PubMed]
[7] Hills, T.T., Maouene, M., Maouene, J., Sheya, A. and Smith, L. (2009) Longitudinal Analysis of Early Semantic Networks: Preferential Attachment or Preferential Acquisition? Psychological Science, 20, 729-739. [Google Scholar] [CrossRef] [PubMed]
[8] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, 2-7 June 2019, 4171-4186.
[9] Xiao, B., Duan, X., Haslett, D.A. and Cai, Z. (2025) Human-Likeness of LLMs in the Mental Lexicon. Proceedings of the 29th Conference on Computational Natural Language Learning, Vienna, July 2025, 586-601. [Google Scholar] [CrossRef
[10] Nelson, D.L., McEvoy, C.L. and Schreiber, T.A. (2004) The University of South Florida Free Association, Rhyme, and Word Fragment Norms. Behavior Research Methods, Instruments, & Computers, 36, 402-407. [Google Scholar] [CrossRef] [PubMed]
[11] Kiss, G.R., Armstrong, C., Milroy, R., Piper, J., Aitken, A., Bailey, R. and Hamilton-Smith, N. (1973) The Computer and Literary Studies. University Press.
[12] van Heuven, W.J.B., Mandera, P., Keuleers, E. and Brysbaert, M. (2014) SUBTLEX-UK: A New and Improved Word Frequency Database for British English. Quarterly Journal of Experimental Psychology, 67, 1176-1190. [Google Scholar] [CrossRef] [PubMed]
[13] Barabási, A.L. and Albert, R. (1999) Emergence of Scaling in Random Networks. Science, 286, 509-512. [Google Scholar] [CrossRef] [PubMed]
[14] Collins, A.M. and Loftus, E.F. (1975) A Spreading-Activation Theory of Semantic Processing. Psychological Review, 82, 407-428. [Google Scholar] [CrossRef
[15] Turney, P.D. and Pantel, P. (2010) From Frequency to Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research, 37, 141-188. [Google Scholar] [CrossRef
[16] De Deyne, S. and Storms, G. (2008) Word Associations: Network and Semantic Properties. Behavior Research Methods, 40, 213-231. [Google Scholar] [CrossRef] [PubMed]