融合检索增强与动静态兴趣建模的多模态序列推荐方法研究

doi:10.12677/sea.2026.152016

期刊菜单

融合检索增强与动静态兴趣建模的多模态序列推荐方法研究
Research on Multimodal Sequential Recommendation Method Fusing Retrieval Augmentation and Dynamic-Static Interest Modeling

DOI: 10.12677/sea.2026.152016, PDF,
作者: 张若妍：上海理工大学光电信息与计算机工程学院，上海
关键词: 多模态推荐；跨模态对齐；序列推荐；对比学习；图神经网络；可解释性推荐；Multimodal Recommendation； Cross-Modal Alignment； Sequential Recommendation； Contrastive Learning； Graph Neural Network； Explainable Recommendation

摘要: 多媒体内容的快速增长给推荐系统带来新挑战。传统序列推荐过度依赖文本信息，难以充分利用视觉语义；仅依赖时序建模，易忽略用户稳定偏好与物品结构关系；现有可解释推荐缺乏事实支撑，可信度较低。为此，本文在RACL-KAL基础上提出多模态推荐模型MM-RACL-KAL。模型融合文本与图像信息增强物品语义表示，通过检索增强扩展用户行为序列；采用Transformer与GNN实现动静态偏好融合建模，并结合多模态对比学习提升跨模态表示一致性；引入知识锚定大模型，生成有事实依据的可解释推荐。在Amazon-Fashion和MovieLens-Poster数据集上的实验表明，该模型在推荐性能与解释质量上均优于现有方法，验证了其有效性与可扩展性。

Abstract: The rapid growth of multimedia content poses new challenges to recommender systems. Traditional sequential recommendation relies excessively on textual information, making it difficult to fully utilize visual semantics. It only depends on temporal modeling, which tends to ignore users’ stable preferences and item structural relationships. Existing explainable recommendation methods lack factual support and thus have low credibility. To address these issues, this paper proposes a multimodal recommendation model MM-RACL-KAL based on RACL-KAL. The model fuses textual and visual information to enhance item semantic representation and extends user behavior sequences via retrieval augmentation. It adopts Transformer and GNN to achieve dynamic-static preference fusion modeling, combined with multimodal contrastive learning to improve cross-modal representation consistency. A knowledge-anchored large model is introduced to generate explainable recommendations with factual basis. Experiments on Amazon-Fashion and MovieLens-Poster datasets demonstrate that the proposed model outperforms state-of-the-art methods in both recommendation performance and explanation quality, verifying its effectiveness and scalability.

文章引用：张若妍. 融合检索增强与动静态兴趣建模的多模态序列推荐方法研究[J]. 软件工程与应用, 2026, 15(2): 154-167. https://doi.org/10.12677/sea.2026.152016

参考文献

[1]	于蒙, 何文涛, 周绪川, 等. 推荐系统综述[J]. 计算机应用, 2022, 42(6): 1898-1913.
[2]	Hidasi, B., Karatzoglou, A., Baltrunas, L., et al. (2016) Session-Based Recommendations with Recurrent Neural Networks. International Conference on Learning Representations, San Juan, 2-4 May 2016, 1-14.
[3]	Kang, W. and McAuley, J. (2018) Self-Attentive Sequential Recommendation. 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17-20 November 2018, 197-206. [Google Scholar] [CrossRef]
[4]	Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., et al. (2019) BERT4Rec. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, 3-7 November 2019, 1441-1450. [Google Scholar] [CrossRef]
[5]	吴正洋, 汤庸, 刘海. 个性化学习推荐研究综述[J]. 计算机科学与探索, 2022, 16(1): 21-40.
[6]	Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y. and Ma, S. (2014) Explicit Factor Models for Explainable Recommendation Based on Phrase-Level Sentiment Analysis. Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, Queensland, 6-11 July 2014, 83-92. [Google Scholar] [CrossRef]
[7]	陈烨, 周刚, 卢记仓. 多模态知识图谱构建与应用研究综述[J]. 计算机应用研究, 2021, 38(12): 3535-3543.
[8]	Liu, B., Liu, X., Luo, Q., et al. (2025) Variational Bayesian Personalized Ranking. arXiv:2503.11067.
[9]	Radford, A., Kim, J.W., Hallacy, C., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. 38th International Conference on Machine Learning (ICML), Virtual Event, 18-24 July 2021, 8748-8763.
[10]	Cui, Q., Wu, S., Liu, Q., Zhong, W. and Wang, L. (2020) MV-RNN: A Multi-View Recurrent Neural Network for Sequential Recommendation. IEEE Transactions on Knowledge and Data Engineering, 32, 317-331. [Google Scholar] [CrossRef]
[11]	Lu, J. and Yamashita, H. (2025) MORE: Modality-Embracing Contrastive Learning for Multimodal Recommendation. International Conference on Multimodal Interaction (ICMI), Canberra, 13-17 October 2025, 1-9.
[12]	Chen, H., Li, J., Zhang, X., et al. (2023) Multi-Modal Self-Supervised Learning for Recommendation. ACM International Conference on Multimedia Retrieval (ICMR), Thessaloniki, 12-15 June 2023, 215-223.
[13]	吕学强, 王夏雨, 马登豪. 面向推荐系统的用户兴趣建模综述[J]. 计算机工程与应用, 2025, 61(21): 15-29.
[14]	He, X., Deng, K., Wang, X., Li, Y., Zhang, Y. and Wang, M. (2020) LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25-30 July 2020, 639-648. [Google Scholar] [CrossRef]
[15]	Wang, X., He, X., Wang, M., Feng, F. and Chua, T. (2019) Neural Graph Collaborative Filtering. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, July 21-25, 2019, 165-174. [Google Scholar] [CrossRef]
[16]	Kipf, T.N. and Welling, M. (2017) Semi-Supervised Classification with Graph Convolutional Networks. 5th International Conference on Learning Representations (ICLR), Toulon, 24-26 April 2017.
[17]	Chen, Y., Liu, Z., Li, J., McAuley, J. and Xiong, C. (2022) Intent Contrastive Learning for Sequential Recommendation. Proceedings of the ACM Web Conference 2022, Lyon, 25-29 April 2022, 2172-2182. [Google Scholar] [CrossRef]
[18]	吴静, 谢辉, 姜火文. 图神经网络推荐系统综述[J]. 计算机科学与探索, 2022, 16(10): 2249-2263.
[19]	孙文彬, 林伟, 方滨兴. 基于门控融合的长短期兴趣序列推荐方法[J]. 计算机学报, 2024, 47(9): 1892-1908.
[20]	Zhang, C., Yao, L. and Sun, A. (2020) FISSA: Fusing Item Similarity Models with Self-Attention Networks for Sequential Recommendation. ACM International Conference on Information and Knowledge Management, Virtual Event, 22-26 September 2020, 3412-3421.
[21]	Li, J., Wang, X. and Hu, X. (2023) Adaptive Gating Fusion for Dynamic-Static Interest in Sequential Recommendation. Knowledge-Based Systems, 275, Article 110789.
[22]	Li, L., Zhang, Y. and Chen, L. (2020) Generate Neural Template Explanations for Recommendation. Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, 19-23 October 2020, 1299-1308. [Google Scholar] [CrossRef]
[23]	Jain, S. and Wallace, B.C. (2019) Attention Is Not Explanation. Proceedings of the 2019 Conference of the North, Minneapolis, 2 June-7 June 2019, 3543-3556. [Google Scholar] [CrossRef]
[24]	Wang, X., He, X., Cao, Y., et al. (2019) Reinforcement Knowledge Graph Reasoning for Explainable Recommendation. ACM International Conference on Information and Knowledge Management, Paris, 21-25 July 2019, 2070-2078.
[25]	高广尚. 可解释推荐模型中的可解释性方法研究综述[J]. 数据分析与知识发现, 2024, 8(8/9): 6-19.
[26]	Gong, J., Cheng, M., Shen, H., Vandenbussche, P., Jenq, J. and Eldardiry, H. (2025) Visual Zero-Shot E-Commerce Product Attribute Value Extraction. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), Albuquerque, 30 April 2025, 460-469. [Google Scholar] [CrossRef]
[27]	张瑞, 卞志鹏. 面向推荐系统的多模态生成研究综述[J]. 计算机科学与探索, 2025, 19(12): 3224-3242.
[28]	吴晔, 陆俊霖. 大模型驱动的多模态信息生成与信息推荐[J]. 河南师范大学学报(自然科学版), 2025, 53(5): 145-151+181.
[29]	[29]Hou, M., Wu, L., Chen, E., Li, Z., Zheng, V.W. and Liu, Q. (2019) Explainable Fashion Recommendation: A Semantic Attribute Region Guided Approach. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 10-16 August 2019, 4681-4688. [Google Scholar] [CrossRef]
[30]	Harper, F.M. and Konstan, J.A. (2015) The Movielens Datasets. ACM Transactions on Interactive Intelligent Systems, 5, 1-19. [Google Scholar] [CrossRef]
[31]	Xia, L., Yang, Y., Chen, Z., et al. (2024) Movie Recommendation with Poster Attention via Multi-Modal Transformer Feature Fusion. arXiv:2407.09157.
[32]	Mancino, A.C.M., Attimonelli, M., Di Fazio, A., Malitesta, D. and Di Noia, T. (2025) Standard Practices for Data Processing and Multimodal Feature Extraction in Recommendation with Datarec and Ducho (d&d4rec). Proceedings of the Nineteenth ACM Conference on Recommender Systems, Prague Czech, 22-26 September 2025, 1432-1434. [Google Scholar] [CrossRef]
[33]	Wei, Y., Wang, X., Nie, L., He, X., Hong, R. and Chua, T. (2019) MMGCN: Multi-Modal Graph Convolution Network for Personalized Recommendation of Micro-Video. Proceedings of the 27th ACM International Conference on Multimedia, Nice, 21-25 October 2019, 1437-1445. [Google Scholar] [CrossRef]
[34]	Anelli, V.W., Bellogin, A., Ferrara, A., Malitesta, D., Merra, F.A., Pomo, C., et al. (2021) Elliot: A Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 11-15 July 2021, 2405-2414. [Google Scholar] [CrossRef]

为你推荐

友情链接