基于强化学习的网络演化博弈合作行为研究
The Study of Cooperative Behaviour in Network Evolutionary Games Based on Reinforcement Learning
摘要: 强化学习因其具有自学习和在线学习能力的特点,日渐成为学者研究演化博弈的重要工具。本文将SARSA算法(State-Action-Reward-State-Action)引入网络博弈中,提出一种基于SARSA算法的演化博弈模型,采用三种强化学习决策机制在四种网络拓扑结构上进行数值仿真模拟。实验表明,引入算法后能明显提高网络中个体的合作水平并且会稳定维持在一个区间范围内。此外,还探讨了算法不同的参数设置、收益矩阵的异质性和个体全局属性对网络合作的影响,结果显示,在学习率较低和折扣率较高以及个体收益适中时对个体间的合作有较好的促进作用。
Abstract: Reinforcement learning is increasingly becoming an important tool for scholars to study evolutionary games due to its features of self-learning and online learning ability. In this paper, the SARSA algorithm (State-Action-Reward-State-Action) is introduced into the network game, and an evolutionary game model based on the SARSA algorithm is proposed, and numerical simulations are conducted on four network topologies using three reinforcement learning decision-making mechanisms. Experiments show that the introduction of the algorithm can significantly improve the level of cooperation of individuals in the network and will be stably maintained in an interval range. In addition, the effects of different parameter settings of the algorithms, the heterogeneity of the payoff matrices and the global attributes of the individuals on the network cooperation are also explored, and the results show that there is a better facilitation of cooperation among individuals at lower learning rates and higher discount rates as well as moderate individual payoffs.
文章引用:陈倩. 基于强化学习的网络演化博弈合作行为研究[J]. 运筹与模糊学, 2024, 14(3): 1073-1085. https://doi.org/10.12677/orf.2024.143340

参考文献

[1] Fowler, J.H. and Christakis, N.A. (2010) Cooperative Behavior Cascades in Human Social Networks. Proceedings of the National Academy of Sciences, 107, 5334-5338. [Google Scholar] [CrossRef] [PubMed]
[2] Apicella, C.L., Marlowe, F.W., Fowler, J.H. and Christakis, N.A. (2012) Social Networks and Cooperation in Hunter-Gatherers. Nature, 481, 497-501. [Google Scholar] [CrossRef] [PubMed]
[3] Helbing, D. and Johansson, A. (2010) Evolutionary Dynamics of Populations with Conflicting Interactions: Classification and Analytical Treatment Considering Asymmetry and Power. Physical Review E, 81, Article ID: 016112. [Google Scholar] [CrossRef] [PubMed]
[4] Wang, Z., Jusup, M., Wang, R., Shi, L., Iwasa, Y., Moreno, Y., et al. (2017) Onymity Promotes Cooperation in Social Dilemma Experiments. Science Advances, 3, e1601444. [Google Scholar] [CrossRef] [PubMed]
[5] Martinez-Vaquero, L., Grujić, J. and Lenaerts, T. (2016) Equivalence of Cooperation Indexes: Comment on “Universal Scaling for the Dilemma Strength in Evolutionary Games” by Z. Wang et al. Physics of Life Reviews, 16, 196-197. [Google Scholar] [CrossRef] [PubMed]
[6] Krivan, V. (2009) Evolutionary Games and Population Dynamics. Proceedings of Seminar in Differential Equations, Vol. 2, 223-233.
[7] 荣智海, 吴枝喜, 王文旭. 共演博弈下网络合作动力学研究进展[J]. 电子科技大学学报, 2013(1): 10-22.
[8] Szabó, G. and Bunth, G. (2018) Social Dilemmas in Multistrategy Evolutionary Potential Games. Physical Review E, 97, Article ID: 012305. [Google Scholar] [CrossRef] [PubMed]
[9] Amaral, M.A., Wardil, L., Perc, M. and da Silva, J.K.L. (2016) Stochastic Win-Stay-Lose-Shift Strategy with Dynamic Aspirations in Evolutionary Social Dilemmas. Physical Review E, 94, Article ID: 032317. [Google Scholar] [CrossRef] [PubMed]
[10] Szolnoki, A. and Perc, M. (2016) Leaders Should Not Be Conformists in Evolutionary Social Dilemmas. Scientific Reports, 6, Article No. 23633. [Google Scholar] [CrossRef] [PubMed]
[11] Santos, F.C., Pacheco, J.M. and Lenaerts, T. (2006) Evolutionary Dynamics of Social Dilemmas in Structured Heterogeneous Populations. Proceedings of the National Academy of Sciences, 103, 3490-3494. [Google Scholar] [CrossRef] [PubMed]
[12] Zhang, Z., Wang, X., Su, C. and Sun, L. (2022) Evolutionary Game Analysis of Shared Manufacturing Quality Synergy under Dynamic Reward and Punishment Mechanism. Applied Sciences, 12, 6792. [Google Scholar] [CrossRef
[13] Gong, Y., Liu, S. and Bai, Y. (2020) Reputation-Based Co-Evolutionary Model Promotes Cooperation in Prisoner’s Dilemma Game. Physics Letters A, 384, Article ID: 126233. [Google Scholar] [CrossRef
[14] Pei, H., Yan, G. and Wang, H. (2021) Reciprocal Rewards Promote the Evolution of Cooperation in Spatial Prisoner’s Dilemma Game. Physics Letters A, 390, Article ID: 127108. [Google Scholar] [CrossRef
[15] Nag Chowdhury, S., Kundu, S., Duh, M., Perc, M. and Ghosh, D. (2020) Cooperation on Interdependent Networks by Means of Migration and Stochastic Imitation. Entropy, 22, Article No. 485. [Google Scholar] [CrossRef] [PubMed]
[16] Arefin, M.R., Tatsukawa, Y. and Tanimoto, J. (2023) Evolution of Cooperation under the Coexistence of Imitation and Aspiration Dynamics in Structured Populations. Nonlinearity, 36, 2286-2309. [Google Scholar] [CrossRef
[17] Hu, X. and Liu, X. (2021) Unfixed-Neighbor-Mechanism Promotes Cooperation in Evolutionary Snowdrift Game on Lattice. Physica A: Statistical Mechanics and Its Applications, 572, Article ID: 125910. [Google Scholar] [CrossRef
[18] 徐小琼, 周朝荣, 马小霞, 等. 容迟网络中基于演化博弈的合作行为[J]. 计算机应用, 2016, 36(2): 483-487.
[19] Wang, H., Liu, N., Zhang, Y., Feng, D., Huang, F., Li, D., et al. (2020) Deep Reinforcement Learning: A Survey. Frontiers of Information Technology & Electronic Engineering, 21, 1726-1744. [Google Scholar] [CrossRef
[20] Nowak, M.A. and May, R.M. (1992) Evolutionary Games and Spatial Chaos. Nature, 359, 826-829. [Google Scholar] [CrossRef
[21] Lu, S., Zhu, G. and Zhang, L. (2023) The Promoting Effect of Adaptive Persistence Aspiration on the Cooperation Based on the Consideration of Payoff and Environment in Prisoner’s Dilemma Game. Biosystems, 226, Article ID: 104868. [Google Scholar] [CrossRef] [PubMed]
[22] Ohdaira, T. (2024) The Universal Probabilistic Reward Based on the Difference of Payoff Realizes the Evolution of Cooperation. Chaos, Solitons & Fractals, 182, Article ID: 114754. [Google Scholar] [CrossRef