社交网络子图采样算法
Subgraph Sampling Algorithm of Social Network
摘要: 在线社交网络(Online Social Networks, OSNs)数据量庞大,如何以低成本采样获取具有代表性的子图成为当前一个研究热点。现有的大部分采样算法仅仅体现在样本列表的无偏特性上,很多情况下,其采样样本构造的诱导子图难以代表原图结构。本文对各种类型的OSN提出了一种新的采样方法,该算法在原有的随机游走算法基础上重新计算了采样跳转概率,修正采样诱导子图的偏差,使其能够更出色地代表原图。同时,本文的采样算法通过计算权重的方式采集邻接节点,省去了自循环过程,从而大幅度提高了采样效率。实验结果表明,本论文提出的采样算法在度分布、聚类系数、传递性、同配性各个方面综合对比,采样获取的子图更加接近原图的属性结构。最后,该算法在大多数情况下,其性能与表现均优于现有采样算法。
Abstract: Online Social Networks (OSNs) maintain a huge amount of data, and how to sample the most representative subgraphs at a lower cost becomes an important research topic. Most of the existing sampling algorithms are only embodied in the unbiased characteristics of the sample list. In many cases, the induced subgraphs constructed from the sampled samples are difficult to represent the original graph structure. This paper proposes a new sampling method for various types of OSNs. This algorithm recalculates the sampling jump probability on the basis of the original random walk algorithm, and corrects the deviation of the sampling-induced subgraph, as a result, it can better represent the original graph. At the same time, the sampling algorithm in this paper collects the adjacent nodes by calculating the weight, eliminating the self-circulation process and greatly improving the sampling efficiency. The experimental results show that the sampling algorithm proposed in this paper is more close to the original image attribute structure in terms of degree distribution, cluster, transitivity, and assortativity. Finally, in most cases, the algorithm perfects better than the existing sampling algorithm.
文章引用:李准, 雷晓颖. 社交网络子图采样算法[J]. 计算机科学与应用, 2022, 12(11): 2633-2645. https://doi.org/10.12677/CSA.2022.1211267

参考文献

[1] 吴信东, 李毅, 李磊. 在线社交网络影响力分析[J]. 计算机学报, 2014, 37(4): 735-752.
[2] 李嘉兴, 王晰巍, 常颖, 王微. 社交网络用户行为国内外研究动态及发展趋势[J]. 现代情报, 2020, 40(4): 167-177.
[3] 李立耀, 孙鲁敬, 杨家海. 社交网络研究综述[J]. 计算机科学, 2015, 42(11): 8-21+42.
[4] Viswanath, B., Mislove, A., Cha, M. and Gummadi, K.P. (2009) On the Evolution of User Interaction in Facebook. Proceedings of the 2nd ACM Workshop on Online Social Networks, Barcelona, 17 August 2009, 37-42. [Google Scholar] [CrossRef
[5] 许进, 杨扬, 蒋飞, 金舒原. 社交网络结构特性分析及建模研究进展[J]. 中国科学院院刊, 2015, 30(2): 216-228. [Google Scholar] [CrossRef
[6] 方滨兴, 贾焰, 韩毅. 社交网络分析核心科学问题、研究现状及未来展望[J]. 中国科学院院刊, 2015, 30(2): 187-199. [Google Scholar] [CrossRef
[7] 魏涛. 社交网络结构特性研究[D]: [硕士学位论文]. 北京: 北京邮电大学, 2015.
[8] Ahn, Y. and Moon, S. (2008) Analysis of Topological Characteristics of Huge Online Social Networking Services. Pro-ceedings of the 16th International Conference on World Wide Web, Banff, 8-12 May 2007, 835-844. [Google Scholar] [CrossRef
[9] Ferrara, E. (2012) A Large-Scale Community Structure Analysis in Facebook. EPJ Data Science, 1, Article No. 9. [Google Scholar] [CrossRef
[10] Chau, P., Pandit, S., Wang, S. and Faloutsos, C. (2007) Parallel Crawling for Online Social Networks. Proceedings of the 16th International Conference on World Wide Web, Banff, 8-12 May 2007, 1283-1284. [Google Scholar] [CrossRef
[11] Gjoka, M., Kurant, M., Butts, C.T. and Markopoulou, A. (2011) Practical Recommendations on Crawling Online Social Networks. IEEE Journal on Selected Areas in Communications, 29, 1872-1892. [Google Scholar] [CrossRef
[12] Wang, T., Yang, C., Zhang, Z., et al. (2011) Understanding Graph Sampling Algorithms for Social Network Analysis. 2011 31st International Conference on Distributed Computing Sys-tems Workshops, Minneapolis, 20-24 June 2011, 123-128. [Google Scholar] [CrossRef
[13] 尤枫, 曹天亮, 卢罡. 在线社交网络的自适应UNI采样方法[J]. 计算机工程, 2017, 43(4): 200-206.
[14] 许南山, 李浩, 卢罡. 在线社交网络的UNI64采样方法[J]. 计算机系统应用, 2014, 23(12): 206-212.
[15] Guilotin-Plantard, N. and Schott, R. (2006) Dynamic Random Walks. Theory and Applications. Elsevier, Amsterdam.
[16] Gjoka, M., Kurant, M., Butts, C.T. and Markopoulou, A. (2010) Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. 2010 Proceedings IEEE INFOCOM, San Diego, 14-19 March 2010, 1-9. [Google Scholar] [CrossRef
[17] Salehi, M., Rabiee, H.R. and Rajabi, A. (2012) Sampling from Complex Networks with High Community Structures. Chaos, 22, 2202-2229. [Google Scholar] [CrossRef] [PubMed]
[18] 蔡君, 余顺争. 基于随机聚类采样算法的复杂网络社团探测[J]. 计算机应用研究, 2013, 30(12): 3560-3563.
[19] 贺云天. 面向大规模社交网络的采样技术研究及其应用[D]: [硕士学位论文]. 合肥: 中国科学技术大学, 2019.
[20] Dong, W., Li, Z. and Xie, G. (2011) Towards Unbiased Sampling of Online Social Networks. IEEE International Conference on Communications, Kyoto, 5-9 June 2011, 1-5.
[21] Jin, L., Chen, Y., Hui, P., et al. (2011) Albatross Sampling: Robust and Effective Hybrid Vertex Sampling for Social Graphs. Proceedings of the 3rd ACM International Workshop on MobiArch, Bethesda, June 2011, 11-16. [Google Scholar] [CrossRef
[22] Chen, B., Liu, L., Jia, H. and Zhang, Y. (2017) Reducing Repetition Rate: Unbiased Delay Sampling in Online Social Networks. Recent Patents on Computer Science, 10, 308-314. [Google Scholar] [CrossRef
[23] Rezvanian, A. and Meybodi, M.R. (2015) Sampling Social Networks Using Shortest Paths. Physica A: Statistical Mechanics & Its Applications, 424, 254-268. [Google Scholar] [CrossRef
[24] 崔颖安, 李雪, 王志晓, 等. 在线社交媒体数据抽样方法的比较研究[J]. 计算机学报, 2014, 37(8): 18.
[25] Costa, L. Da F., Rodrigues, F.A., Travieso, G. and Villas Boas, P.R. (2005) Characterization of Complex Networks: A Survey of Measurements. Advances in Physics, 56, 167-242.