融合µ2Net+ (ViT-L/16)的多模态商品推荐系统
Research on Multimodal Commodity Recommendation System Incorporating µ2Net+ (ViT-L/16)
DOI: 10.12677/sea.2025.143064, PDF,   
作者: 李洪军:上海理工大学光电信息与计算机工程学院,上海
关键词: Bert知识图谱图卷积神经网络推荐系统µ2Net+ (ViT-L/16)Bert Knowledge Graph GCN Recommendation System µ2Net+ (ViT-L/16)
摘要: 随着各领域推荐系统的迅速发展,多模态推荐已经成为个性化推荐的下一个竞争赛道。在以往的神经网络推荐模型中总是采用单一的推荐模型,例如Bert4Rec,DLRM等,它们都存在一些限制,a) 单一架构无法透彻解析用户输入文本信息的意图,b) 总是将输入限制为单一的文本或者图像。为了解决这些限制,实现推荐系统从单模态向多模态的跨越,本文提出了KGBM4Rec模型,这是一种基于知识图谱、图神经网络、Bert序列推荐和µ2Net+ (ViT-L/16)的混合神经网络模型。该模型通过构建领域知识图谱,提高模型扩展性,利用图神经网络从知识图谱中提取用户及商品之间的关联特征;采用Bert序列模型对文本特征进行建模;µ2Net+ (ViT-L/16)利用具有一系列卷积层和池化层的U形网络架构从图像中提取层次特征进行建模。多模态的个性化推荐可以让用户体验得到极大提升,本文在淘宝直播多模态视频商品检索数据集(Taobao)、来自在线京东购物网站的Products-10K数据集上进行实验,最终实验结果表明,所提出的推荐模型在多个数据集上均取得了平均高于基线模型10%的推荐效果。
Abstract: With the rapid development of recommendation systems in various fields, multimodal recommen-dation has become the next competitive track for personalized recommendation. In the past, neural network recommendation models always used a single recommendation model, such as Bert4Rec, DLRM, etc. They all have some limitations: a) a single architecture cannot thoroughly analyze the intention of the user’s input text information, and b) the input is always limited to a single text or image. In order to solve these limitations and realize the transition of the recommendation system from single modality to multimodality, this paper proposes the KGBM4Rec model, which is a hybrid neural network model based on knowledge graph, graph neural network, Bert sequence recommendation and µ2Net+ (ViT-L/16). This model improves the scalability of the model by constructing a domain knowledge graph, and uses graph neural network to extract the correlation features between users and products from the knowledge graph; the Bert sequence model is used to model text features; µ2Net+ (ViT-L/16) uses a U-shaped network architecture with a series of convolutional layers and pooling layers to extract hierarchical features from images for modeling. Multimodal personalized recommendations can greatly improve user experience. This paper conducts experiments on Taobao live multimodal video product retrieval dataset (Taobao) and Products-10K dataset from JD.com online shopping website. The final experimental results show that the proposed recommendation model has achieved an average recommendation effect of 10% higher than the baseline model on multiple datasets.
文章引用:李洪军. 融合µ2Net+ (ViT-L/16)的多模态商品推荐系统[J]. 软件工程与应用, 2025, 14(3): 723-735. https://doi.org/10.12677/sea.2025.143064

参考文献

[1] Gesmundo, A. and Dean, J. (2022) Munet: Evolving Pretrained deep Neural Networks into Scalable Auto-Tuning Multitask Systems. arXiv: 2205.10937.
[2] Gao, C., Zheng, Y., Li, N., Li, Y., Qin, Y., Piao, J., et al. (2023) A Survey of Graph Neural Networks for Recommender Systems: Challenges, Methods, and Directions. ACM Transactions on Recommender Systems, 1, 1-51. [Google Scholar] [CrossRef
[3] Yu, J., Yin, H., Xia, X., et al. (2023) Self-Supervised Learning for Recommender Systems: A Survey. IEEE Transactions on Knowledge and Data Engineering, 36, 335-355.
[4] Liu, P., Zhang, L. and Gulla, J.A. (2023) Pre-Train, Prompt, and Recommendation: A Comprehensive Survey of Language Modeling Paradigm Adaptations in Recommender Systems. Transactions of the Association for Computational Linguistics, 11, 1553-1571. [Google Scholar] [CrossRef
[5] Walek, B. and Fajmon, P. (2023) A Hybrid Recommender System for an Online Store Using a Fuzzy Expert System. Expert Systems with Applications, 212, Article ID: 118565. [Google Scholar] [CrossRef
[6] Lin, Y., Liu, Y., Lin, F., Zou, L., Wu, P., Zeng, W., et al. (2024) A Survey on Reinforcement Learning for Recommender Systems. IEEE Transactions on Neural Networks and Learning Systems, 35, 13164-13184. [Google Scholar] [CrossRef] [PubMed]
[7] Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., et al. (2019) BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transforme. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, 3-7 November 2019, 1441-1450. [Google Scholar] [CrossRef
[8] Naumov, M., Mudigere, D., Shi, H.J.M., et al. (2019) Deep Learning Recommendation Model for Personalization and Recommendation Systems. arXiv: 1906.00091.
[9] Peng, C., Xia, F., Naseriparsa, M., et al. (2023) Knowledge Graphs: Opportunities and Challenges. Artificial Intelligence Review, 56, 13071-13102.
[10] Chandak, P., Huang, K. and Zitnik, M. (2023) Building a Knowledge Graph to Enable Precision Medicine. Scientific Data, 10, Article No. 67. [Google Scholar] [CrossRef] [PubMed]
[11] Chen, X., Jia, S. and Xiang, Y. (2020) A Review: Knowledge Reasoning over Knowledge Graph. Expert Systems with Applications, 141, Article ID: 112948. [Google Scholar] [CrossRef
[12] Zhu, X., Li, Z., Wang, X., et al. (2022) Multi-Modal Knowledge Graph Construction and Application: A Survey. IEEE Transactions on Knowledge and Data Engineering, 36, 715-735.
[13] Wu, X., Duan, J., Pan, Y. and Li, M. (2023) Medical Knowledge Graph: Data Sources, Construction, Reasoning, and Applications. Big Data Mining and Analytics, 6, 201-217. [Google Scholar] [CrossRef
[14] Zhou, B., Shen, X., Lu, Y., Li, X., Hua, B., Liu, T., et al. (2022) Semantic-aware Event Link Reasoning over Industrial Knowledge Graph Embedding Time Series Data. International Journal of Production Research, 61, 4117-4134. [Google Scholar] [CrossRef
[15] Veličković, P. (2023) Everything Is Connected: Graph Neural Networks. Current Opinion in Structural Biology, 79, Article ID: 102538. [Google Scholar] [CrossRef] [PubMed]
[16] Wu, L., Chen, Y., Shen, K., Guo, X., Gao, H., Li, S., et al. (2023) Graph Neural Networks for Natural Language Processing: A Survey. Foundations and Trends in Machine Learning, 16, 119-328. [Google Scholar] [CrossRef
[17] Cappart, Q., Chételat, D., Khalil, E.B., et al. (2023) Combinatorial Optimization and Reasoning with Graph Neural Networks. Journal of Machine Learning Research, 24, 130:1-130:61.
[18] Zhou, C., Li, Q., Li, C., Yu, J., Liu, Y., Wang, G., et al. (2024) A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT. International Journal of Machine Learning and Cybernetics. [Google Scholar] [CrossRef
[19] Zyuzin, V. and Chumarnaya, T. (2019) Comparison of UNet Architectures for Segmentation of the Left Ventricle Endocardial Border on Two-Dimensional Ultrasound Images. 2019 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), Yekaterinburg, 25-26 April 2019, 110-113. [Google Scholar] [CrossRef
[20] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., et al. (2023) Swin-Unet: UNet-Like Pure Transformer for Medical Image Segmentation. In: Karlinsky, L., Michaeli, T. and Nishino, K., Eds., Computer VisionECCV 2022 Workshops, Springer, 205-218. [Google Scholar] [CrossRef
[21] Gesmundo, A. (2022) A Continual Development Methodology for Large-Scale Multitask Dynamic ML Systems. arXiv: 2209.07326.
[22] Sanchez-Lengeling, B., Reif, E., Pearce, A. and Wiltschko, A. (2021) A Gentle Introduction to Graph Neural Networks. Distill, 6, e33. [Google Scholar] [CrossRef
[23] Vaswani, A., et al. (2017) Attention Is All You Need. arXiv: 1706.03762.
[24] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A. and Zagoruyko, S. (2020) End-to-End Object Detection with Transformers. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer VisionECCV 2020, Springer, 213-229. [Google Scholar] [CrossRef
[25] Dosovitskiy, A., et al. (2021) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929.
[26] Gabeur, V., Sun, C., Alahari, K. and Schmid, C. (2020) Multi-Modal Transformer for Video Retrieval. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer VisionECCV 2020, Springer, 214-229. [Google Scholar] [CrossRef
[27] Zhang, J., Li, C., Yin, Y., Zhang, J. and Grzegorzek, M. (2022) Applications of Artificial Neural Networks in Microorganism Image Analysis: A Comprehensive Review from Conventional Multilayer Perceptron to Popular Convolutional Neural Network and Potential Visual Transformer. Artificial Intelligence Review, 56, 1013-1070. [Google Scholar] [CrossRef] [PubMed]
[28] Cong, S. and Zhou, Y. (2022) A Review of Convolutional Neural Network Architectures and Their Optimizations. Artificial Intelligence Review, 56, 1905-1969. [Google Scholar] [CrossRef
[29] Gaur, L., Bhatia, U., Jhanjhi, N.Z., Muhammad, G. and Masud, M. (2021) Medical Image-Based Detection of COVID-19 Using Deep Convolution Neural Networks. Multimedia Systems, 29, 1729-1738. [Google Scholar] [CrossRef] [PubMed]
[30] Zhou, X. (2023) MMRec: Simplifying Multimodal Recommendation. ACM Multimedia Asia Workshops, Tainan, 6-8 December 2023, 1-2. [Google Scholar] [CrossRef
[31] Liu, K., Xue, F., Guo, D., et al. (2023) MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation. ACM Transactions on Information Systems, 41, 1-27.
[32] Mu, Y. and Wu, Y. (2023) Multimodal Movie Recommendation System Using Deep Learning. Mathematics, 11, Article 895. [Google Scholar] [CrossRef
[33] Zhang, L., Zhou, X. and Shen, Z. (2023) Multimodal Pre-Training Framework for Sequential Recommendation via Contrastive Learning. arXiv: 2303.11879.
[34] Lu, Q., Sun, X., Sutcliffe, R., Xing, Y. and Zhang, H. (2022) Sentiment Interaction and Multi-Graph Perception with Graph Convolutional Networks for Aspect-Based Sentiment Analysis. Knowledge-Based Systems, 256, Article ID: 109840. [Google Scholar] [CrossRef
[35] Huang, S., Dong, L., Wang, W., et al. (2023) Language Is Not All You Need: Aligning Perception with Language Models. arXiv: 2302.14045.
[36] Abbasi-Moud, Z., Hosseinabadi, S., Kelarestaghi, M. and Eshghi, F. (2022) CAFOB: Context-Aware Fuzzy-Ontology-Based Tourism Recommendation System. Expert Systems with Applications, 199, Article ID: 116877. [Google Scholar] [CrossRef
[37] Gesmundo, A. (2023) Multipath Agents for Modular Multitask ML Systems. arXiv: 2302.02721.
[38] Urabe, Y., Rzepka, R. and Araki, K. (2021) Find Right Countenance for Your Input—Improving Automatic Emoticon Recommendation System with Distributed Representations. Information Processing & Management, 58, Article ID: 102414. [Google Scholar] [CrossRef
[39] Zhou, Q., Zhuang, W., Ren, H., Chen, Y., Yu, B., Lou, J., et al. (2022) Hybrid Collaborative Filtering Model for Consumer Dynamic Service Recommendation Based on Mobile Cloud Information System. Information Processing & Management, 59, Article ID: 102871. [Google Scholar] [CrossRef
[40] Misztal-Radecka, J. and Indurkhya, B. (2021) Bias-Aware Hierarchical Clustering for Detecting the Discriminated Groups of Users in Recommendation Systems. Information Processing & Management, 58, Article ID: 102519. [Google Scholar] [CrossRef
[41] Zhao, Z., Fan, W., Li, J., Liu, Y., Mei, X., Wang, Y., et al. (2024) Recommender Systems in the Era of Large Language Models (LLMs). IEEE Transactions on Knowledge and Data Engineering, 36, 6889-6907. [Google Scholar] [CrossRef
[42] Gao, C., Zheng, Y., Wang, W., Feng, F., He, X. and Li, Y. (2024) Causal Inference in Recommender Systems: A Survey and Future Directions. ACM Transactions on Information Systems, 42, 1-32. [Google Scholar] [CrossRef
[43] Deldjoo, Y., Jannach, D., Bellogin, A., Difonzo, A. and Zanzonelli, D. (2023) Fairness in Recommender Systems: Research Landscape and Future Directions. User Modeling and User-Adapted Interaction, 34, 59-108. [Google Scholar] [CrossRef