LE-DCBFD:基于图神经网络的链路增强带Dice损失的均衡一致欺诈检测器
LE-DCBFD: Link Enhanced Dice-Loss Consistent Balanced Fraud Detector Using Graph Neural Networks
摘要: 图神经网络(GNNs)因其卓越的数据表征能力以及探索社交网络中复杂关系的能力,已被广泛应用于欺诈检测领域。本文提出了一种新颖的图数据增强算法——半可学习启发式链接预测算法,该算法利用丰富的标签信息来解决因数据缺失和人为操纵(如欺诈伪装)导致的网络信息丢失问题。基于此算法,本文提出了一种欺诈检测模型:基于图神经网络的链路增强的带Dice损失的均衡一致欺诈检测器(LE-DCBFD)。在两个公开的真实世界欺诈检测数据集(Amazon和Yelp)上对LE-DCBFD模型进行了评估。结果表明,本文的模型优于多个基线模型,在规模更大的Yelp数据集上,欺诈检测性能提升了超过10%。在消融实验中,它也优于未采用本文所提出的链接增强器的DCBFD模型,这证实了链接增强器对性能提升的重要性。即使在使用较小的训练数据集时,LE-DCBFD也展现出优越性,证明它比DCBFD更有效。
Abstract: Graph Neural Networks (GNNs) have been widely adopted in fraud detection due to their exceptional data representation capabilities and their ability to explore complex relationships in social networks. This paper introduces a novel graph data augmentation algorithm, the semi-learnable heuristic link prediction algorithm, which leverages rich label information to address network information loss caused by insufficient data and artificial manipulation, such as fraud camouflage. Based on this algorithm, we propose a fraud detection model: Link Enhanced Dice-loss Consistent Balanced Fraud Detector (LE-DCBFD). We evaluated the LE-DCBFD model on two public real-world fraud detection datasets, Amazon and Yelp. The results show that this model outperforms multiple baseline models, with the fraud detection performance on the larger Yelp dataset improving by over 10%. In the ablation experiments, it also surpasses the DCBFD model without our proposed Link Enhancer (a link prediction algorithm), which confirms the importance of the Link Enhancer for performance improvement. Even when using a smaller training dataset, LE-DCBFD demonstrates superiority, proving that it is more effective than DCBFD.
文章引用:黄辉林, 范永希, 郑迪宇. LE-DCBFD:基于图神经网络的链路增强带Dice损失的均衡一致欺诈检测器[J]. 数据挖掘, 2025, 15(4): 295-309. https://doi.org/10.12677/hjdm.2025.154026

参考文献

[1] Han, Q., Wen, H. and Miao, F. (2018) Rumor Spreading in Interdependent Social Networks. Peer-to-Peer Networking and Applications, 11, 955-965. [Google Scholar] [CrossRef
[2] Sreenivasulu, V. and Wajeed, M.A. (2021) Image Based Classification of Rumor Information from the Social Network Platform. Traitement du Signal, 38, 1413-1421. [Google Scholar] [CrossRef
[3] Wu, J., Hu, R., Li, D., Ren, L., Huang, Z. and Zang, Y. (2024) Beyond the Individual: An Improved Telecom Fraud Detection Approach Based on Latent Synergy Graph Learning. Neural Networks, 169, 20-31. [Google Scholar] [CrossRef] [PubMed]
[4] Abdul Salam, M., Fouad, K.M., Elbably, D.L. and Elsayed, S.M. (2024) Federated Learning Model for Credit Card Fraud Detection with Data Balancing Techniques. Neural Computing and Applications, 36, 6231-6256. [Google Scholar] [CrossRef
[5] Mao, X., Sun, H., Zhu, X. and Li, J. (2022) Financial Fraud Detection Using the Related-Party Transaction Knowledge Graph. Procedia Computer Science, 199, 733-740. [Google Scholar] [CrossRef
[6] Bayerstadler, A., van Dijk, L. and Winter, F. (2016) Bayesian Multinomial Latent Variable Modeling for Fraud and Abuse Detection in Health Insurance. Insurance: Mathematics and Economics, 71, 244-252. [Google Scholar] [CrossRef
[7] Yan, C., Li, M., Liu, W. and Qi, M. (2020) Improved Adaptive Genetic Algorithm for the Vehicle Insurance Fraud Identification Model Based on a BP Neural Network. Theoretical Computer Science, 817, 12-23. [Google Scholar] [CrossRef
[8] Rybalchenko, L., Ryzhkov, E. and Ciobanu, G. (2022) Global Consequences of the Loss of Business in Countries around the World Caused by Fraud. Philosophy, Economics and Law Review, 2, 118-126.
[9] Kipf, T.N. and Welling, M. (2017) Semi-Supervised Classification with Graph Convolutional Networks. Proceedings of the International Conference on Learning Representations (ICLR). arXiv:1609.02907.
[10] Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P. and Bengio, Y. (2017) Graph Attention Networks. Proceedings of the International Conference on Learning Representations (ICLR). arXiv:1710.10903.
[11] Hamilton, W.L., Ying, R. and Leskovec, J. (2017) Inductive Representation Learning on Large Graphs. Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS), Long Beach, 4-9 December 2017, 1025-1035.
[12] Wang, J., Wen, R., Wu, C., Huang, Y. and Xiong, J. (2019) FdGars: Fraudster Detection via Graph Convolutional Networks in Online App Review System. Companion Proceedings of the 2019 World Wide Web Conference, San Francisco, 13-17 May 2019, 310-316. [Google Scholar] [CrossRef
[13] Wang, D., Lin, J., Cui, P., Jia, Q., Wang, Z., Fang, Y., et al. (2019) A Semi-Supervised Graph Attentive Network for Financial Fraud Detection. 2019 IEEE International Conference on Data Mining (ICDM), Beijing, 8-11 November 2019, 598-607. [Google Scholar] [CrossRef
[14] Hu, X., Chen, H., Liu, S., Jiang, H., Chu, G. and Li, R. (2022) BTG: A Bridge to Graph Machine Learning in Telecommunications Fraud Detection. Future Generation Computer Systems, 137, 274-287. [Google Scholar] [CrossRef
[15] Liu, Z., Dou, Y., Yu, P.S., Deng, Y. and Peng, H. (2020) Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event China, 25-30 July 2020, 1569-1572. [Google Scholar] [CrossRef
[16] Dou, Y., Liu, Z., Sun, L., Deng, Y., Peng, H. and Yu, P.S. (2020) Enhancing Graph Neural Network-Based Fraud Detectors against Camouflaged Fraudsters. Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event Ireland, 19-23 October 2020, 315-324. [Google Scholar] [CrossRef
[17] Yang, X., Lyu, Y., Tian, T., Liu, Y., Liu, Y. and Zhang, X. (2020) Rumor Detection on Social Media with Graph Structured Adversarial Learning. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, 7-15 January 2021, 1417-1423. [Google Scholar] [CrossRef
[18] Liu, Y., Ao, X., Qin, Z., Chi, J., Feng, J., Yang, H., et al. (2021) Pick and Choose: A GNN-Based Imbalanced Learning Approach for Fraud Detection. Proceedings of the Web Conference 2021, Ljubljana, 19-23 April 2021, 3168-3177. [Google Scholar] [CrossRef
[19] Gao, Y., Wang, X., He, X., Liu, Z., Feng, H. and Zhang, Y. (2023) Addressing Heterophily in Graph Anomaly Detection: A Perspective of Graph Spectrum. Proceedings of the ACM Web Conference 2023, Austin, 30 April 2023-4 May 2023, 1528-1538. [Google Scholar] [CrossRef
[20] Tang, J., Hua, F.R., Gao, Z.Q., Zhao, P.L. and Li, J. (2023) GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection. arXiv: 2306.12251.
[21] Hong, B., Lu, P., Xu, H., Lu, J., Lin, K. and Yang, F. (2024) Health Insurance Fraud Detection Based on Multi-Channel Heterogeneous Graph Structure Learning. Heliyon, 10, e30045. [Google Scholar] [CrossRef] [PubMed]
[22] Zhang, Y., Fan, Y., Ye, Y., Zhao, L. and Shi, C. (2019) Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, November 3-7, 2019, 549-558. [Google Scholar] [CrossRef
[23] Kong, M., Li, R., Wang, J., Li, X., Jin, S., Xie, W., et al. (2024) CFTNet: A Robust Credit Card Fraud Detection Model Enhanced by Counterfactual Data Augmentation. Neural Computing and Applications, 36, 8607-8623. [Google Scholar] [CrossRef
[24] Chen, J., Chen, Q., Jiang, F., Guo, X., Sha, K. and Wang, Y. (2024) SCN_GNN: A GNN-Based Fraud Detection Algorithm Combining Strong Node and Graph Topology Information. Expert Systems with Applications, 237, Article 121643. [Google Scholar] [CrossRef
[25] Wu, J., Hu, R., Li, D., Ren, L., Hu, W. and Zang, Y. (2024) A GNN-Based Fraud Detector with Dual Resistance to Graph Disassortativity and Imbalance. Information Sciences, 669, Article 120580. [Google Scholar] [CrossRef
[26] Li, A., Qin, Z., Liu, R., Yang, Y. and Li, D. (2019) Spam Review Detection with Graph Convolutional Networks. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, November 3-7, 2019, 2703-2711. [Google Scholar] [CrossRef
[27] Zhang, M. and Chen, Y. (2017) Weisfeiler-Lehman Neural Machine for Link Prediction. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, 13-17 August 2017, 1835-1844. [Google Scholar] [CrossRef
[28] Islam, M.K., Aridhi, S. and Smail-Tabbone, M. (2020) Appraisal Study of Similarity-Based and Embedding-Based Link Prediction Methods on Graphs. 10th International Conference on Data Mining & Knowledge Management Process, London, 25-26 July 2021, 81-92. [Google Scholar] [CrossRef
[29] Lorrain, F. and White, H.C. (1971) Structural Equivalence of Individuals in Social Networks. The Journal of Mathematical Sociology, 1, 49-80. [Google Scholar] [CrossRef
[30] Zhou, T., Lü, L. and Zhang, Y. (2009) Predicting Missing Links via Local Information. The European Physical Journal B, 71, 623-630. [Google Scholar] [CrossRef
[31] Jaccard, P. (1901) Etude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547-579.
[32] Zhang, M. and Chen, Y. (2018) Link Prediction Based on Graph Neural Networks. Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, 3-8 December 2018, 5165-5175.
[33] Schwarz, K. (2011) Darts, Dice, and Coins: Sampling from a Discrete Distribution.
https://www.keithschwarz.com/darts-dice-coins/
[34] Martínez, C. and Roura, S. (1998) Randomized Binary Search Trees. Journal of the ACM, 45, 288-323. [Google Scholar] [CrossRef
[35] Milletari, F., Navab, N. and Ahmadi, S. (2016) V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 2016 Fourth International Conference on 3D Vision (3DV), Stanford, 25-28 October 2016, 565-571. [Google Scholar] [CrossRef
[36] Rayana, S. and Akoglu, L. (2015) Collective Opinion Spam Detection. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 10-13 August 2015, 985-994. [Google Scholar] [CrossRef
[37] McAuley, J.J. and Leskovec, J. (2013) From amateurs to Connoisseurs. Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, 13-17 May 2013, 897-908. [Google Scholar] [CrossRef
[38] Jindal, N. and Liu, B. (2008) Opinion Spam and Analysis. Proceedings of the International Conference on Web Search and Web Data Mining, Palo Alto, 11-12 February 2008, 219-230. [Google Scholar] [CrossRef
[39] Cortes, C. and Vapnik, V. (1995) Support-Vector Networks. Machine Learning, 20, 273-297. [Google Scholar] [CrossRef
[40] Quinlan, J.R. (1986) Induction of Decision Trees. Machine Learning, 1, 81-106. [Google Scholar] [CrossRef
[41] Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I. and Welling, M. (2018) Modeling Relational Data with Graph Convolutional Networks. In: Gangemi, A., et al., Eds., Lecture Notes in Computer Science, Springer International Publishing, 593-607. [Google Scholar] [CrossRef
[42] Liu, Z., Chen, C., Li, L., Zhou, J., Li, X., Song, L., et al. (2019) GeniePath: Graph Neural Networks with Adaptive Receptive Paths. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 4424-4431. [Google Scholar] [CrossRef
[43] Hanley, J.A. and McNeil, B.J. (1982) The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve. Radiology, 143, 29-36. [Google Scholar] [CrossRef] [PubMed]
[44] Salton, G., Singhal, A., Mitra, M. and Buckley, C. (1997) Automatic Text Structuring and Summarization. Information Processing & Management, 33, 193-207. [Google Scholar] [CrossRef