面向国潮文化偏好的多模态电商推荐技术综述
A Review of Multimodal E-Commerce Recommendation Technologies for Domestic Trend Cultural Preferences
摘要: 随着国潮品牌与文化消费的持续发展,电商平台中的商品推荐面临由功能与价格导向向文化符号、审美风格与情感认同导向转变的趋势。国潮商品通常同时包含视觉风格、文本语义与文化属性等多源信息,单一模态或传统协同过滤方法难以有效刻画其复杂的偏好特征。近年来,基于深度学习的多模态推荐技术在电商领域取得了广泛应用,为融合商品图像、文本描述、用户评论及结构化属性信息提供了新的研究思路,但其在国潮文化偏好建模中的系统性总结仍相对不足。本文围绕面向国潮文化偏好的多模态电商推荐技术展开综述。首先分析国潮消费场景下用户偏好与商品特征的多模态特性,梳理相关数据形态与常用公开数据资源;随后从多模态表征学习、跨模态融合与对齐、图结构建模与知识增强、模型优化与训练策略四个维度,对近年来的代表性研究工作进行分类总结,并探讨各方法在国潮场景下的适配性。在此基础上,本文探讨了大语言模型驱动的生成式推荐范式,分析其在深层文化知识推理与可信解释生成方面的破局作用。最后,本文总结了当前研究在文化语义建模、模态对齐、偏好动态建模与推荐解释等方面面临的主要挑战,并对多模态大模型驱动的电商推荐、人机协同交互机制以及文化偏好可控建模等未来研究方向进行了展望。本文可为国潮商品推荐系统的研究与实践提供参考。
Abstract: With the continued growth of domestic trend brands and cultural consumption, product recommendations on e-commerce platforms are shifting from a focus on functionality and price to an emphasis on cultural symbols, aesthetic styles, and emotional resonance. Domestic trend products typically involve multi-source information, including visual style, textual semantics, and cultural attributes. As a result, single-modality methods and traditional collaborative filtering approaches often struggle to capture such complex preference patterns effectively. In recent years, deep learning-based multimodal recommendation techniques have been widely adopted in e-commerce, providing new avenues for integrating product images, textual descriptions, user reviews, and structured attribute information. However, systematic reviews of their applications to modeling preferences in domestic trend culture remain limited. This paper presents an overview of multimodal recommendation technologies for e-commerce that are tailored to domestic trend cultural preferences. First, we examine the multimodal characteristics of user preferences and product attributes in domestic trend consumption scenarios, and we identify relevant data modalities, formats, and commonly used public datasets. We then categorize and summarize representative studies from recent years across four dimensions: multimodal representation learning, cross-modal fusion and alignment, graph structure modeling and knowledge enhancement, and model optimization with training strategies, while discussing the suitability of these methods for domestic trend scenarios. Furthermore, this paper highlights the LLM-driven generative recommendation paradigm, analyzing its breakthrough role in deep cultural knowledge reasoning and the generation of trustworthy explanations. Finally, we summarize key challenges in cultural semantic modeling, modality alignment, preference dynamics modeling, and recommendation explainability. We also outline promising future directions, such as multimodal large-model-driven e-commerce recommendation, human-machine collaborative interaction mechanisms, and controllable modeling of cultural preferences. This paper aims to serve as a reference for both research and practice in recommendation systems for domestic trend products.
文章引用:田文芳, 李佳燕, 郑丹, 陈静怡, 蒋智贤, 何庆. 面向国潮文化偏好的多模态电商推荐技术综述[J]. 图像与信号处理, 2026, 15(2): 196-211. https://doi.org/10.12677/jisp.2026.152017

参考文献

[1] 张时俊, 王永恒. 基于矩阵分解的个性化推荐系统研究[J]. 中文信息学报, 2017, 31(3): 134-139, 169.
[2] 袁文华. 国家认同视域下青年国潮消费的表征、动因与引领[J]. 中国青年研究, 2024(11): 4-11, 94.
[3] 唐卓亚, 杨娟. 青年国潮消费背后的文化自信与价值引导研究——以电商平台为例[J]. 电子商务评论, 2025, 14(12): 3848-3856.
[4] 郝雅娴, 孙艳蕊. K-近邻矩阵分解推荐系统算法[J]. 小型微型计算机系统, 2018, 39(4): 755-758.
[5] Li, H., Huang, X., Tian, W. and Chen, X. (2026) Causal Interest Modeling and Popularity Bias Mitigation in Conversational Recommender Systems. Knowledge-Based Systems, 331, Article ID: 114806. [Google Scholar] [CrossRef
[6] Liu, F., Chen, D., Du, X., Gao, R. and Xu, F. (2023) MEP-3M: A Large-Scale Multi-Modal E-Commerce Product Dataset. Pattern Recognition, 140, Article ID: 109519. [Google Scholar] [CrossRef
[7] Chen, S., Bouadjenek, M.R., Jameel, S., Naseem, U., Suleiman, B., Salim, F.D., Hacid, H. and Razzak, I. (2025) Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification. Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), Abu Dhabi, 19-24 January 2025, 6244-6254.
[8] Cui, Y., Liu, Y., Liu, X., Wang, Y. and Zhu, Y. (2021) M5Product: A Large-Scale Multimodal Product Dataset. arXiv: 2109.04275.
[9] Standley, T.S., Gao, R., Chen, D., Wu, J. and Savarese, S. (2023) An Extensible Multi-Modal Multi-Task Object Dataset with Materials. The Eleventh International Conference on Learning Representations (ICLR 2023), Kigali, 1-5 May 2023, 1-18.
https://openreview.net/forum?id=n70oyIlS4g
[10] Gupta, A., Mehrotra, R., Bhattacharya, P., Sharma, A. and Chandar, P. (2021) The SIGIR 2021 eCom Data Challenge. SIGIR Forum, 55, Article 19.
[11] H&M Group (2022) H&M Personalized Fashion Recommendations. Kaggle.
https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations
[12] Zhu, H., Chang, D., Xu, Z., Zhang, P., Li, X., He, J., et al. (2024) UserBehavior: A Dataset for Recommendation. Alibaba Cloud Tianchi.
https://tianchi.aliyun.com/dataset/dataDetail?dataId=649
[13] Liu, Q., Hu, J., Xiao, Y., Zhao, X., Gao, J., Wang, W., et al. (2024) Multimodal Recommender Systems: A Survey. ACM Computing Surveys, 57, 1-17. [Google Scholar] [CrossRef
[14] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[15] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929.
[16] Devlin, J., Chang, M.W., Lee, K., et al. (2019) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, 2-7 June 2019, 4171-4186.
[17] Liu, F., Chen, H., Cheng, Z., Nie, L. and Kankanhalli, M. (2023) Semantic-Guided Feature Distillation for Multimodal Recommendation. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, 29 October-3 November 2023, 6567-6575. [Google Scholar] [CrossRef
[18] Ye, Y., Zheng, Z., Shen, Y., Wang, T., Zhang, H., Zhu, P., et al. (2025) Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 39, 13069-13077. [Google Scholar] [CrossRef
[19] Radford, A., Kim, J.W., Hallacy, C., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. International Conference on Machine Learning. PmLR, 2021, 18-24 July 2021, 8748-8763.
[20] Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Su, Y., et al. (2023) Parameter-Efficient Fine-Tuning of Large-Scale Pre-Trained Language Models. Nature Machine Intelligence, 5, 220-235. [Google Scholar] [CrossRef
[21] Li, S. and Tang, H. (2024) Multimodal Alignment and Fusion: A Survey. arXiv: 2411.17040.
[22] Chen, J., Zhang, H., He, X., Nie, L., Liu, W. and Chua, T. (2017) Attentive Collaborative Filtering: Multimedia Recommendation with Item-and Component-Level Attention. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, 7-11 August 2017, 335-344. [Google Scholar] [CrossRef
[23] Wei, Y., Wang, X., Nie, L., He, X., Hong, R. and Chua, T. (2019) MMGCN: Multi-Modal Graph Convolution Network for Personalized Recommendation of Micro-Video. Proceedings of the 27th ACM International Conference on Multimedia, Nice, 21-25 October 2019, 1437-1445. [Google Scholar] [CrossRef
[24] Yuan, X., Qi, A., Wu, H., Wang, J., Guo, Y., Li, S., et al. (2025) Cross-Modal Feature Alignment and Fusion with Contrastive Learning in Multimodal Recommendation. Knowledge-Based Systems, 326, Article ID: 114020. [Google Scholar] [CrossRef
[25] Xiu, Y. and Tong, X. (2026) Dual-Layer Cross-Modal Alignment Recommendation Based on the Diffusion Model. Information Fusion, 125, Article ID: 103472. [Google Scholar] [CrossRef
[26] Ma, B., Liu, L.Y., Hu, Z.H., et al. (2025) ExplainRec: Towards Explainable Multi-Modal Zero-Shot Recommendation with Preference Attribution and Large Language Models. arXiv: 2511.14770.
[27] Wang, H., Zhang, F., Wang, J., Zhao, M., Li, W., Xie, X., et al. (2018) RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, 22-26 October 2018, 417-426. [Google Scholar] [CrossRef
[28] Sun, R., Cao, X., Zhao, Y., Wan, J., Zhou, K., Zhang, F., et al. (2020) Multi-Modal Knowledge Graphs for Recommender Systems. Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 19-23 October 2020, 1405-1414. [Google Scholar] [CrossRef
[29] Wang, Q., Wei, Y., Yin, J., Wu, J., Song, X. and Nie, L. (2023) Dualgnn: Dual Graph Neural Network for Multimedia Recommendation. IEEE Transactions on Multimedia, 25, 1074-1084. [Google Scholar] [CrossRef
[30] Wei, Y., Wang, X., Nie, L., He, X. and Chua, T. (2020) Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, 12-16 October 2020, 3541-3549. [Google Scholar] [CrossRef
[31] Ping, Y., Wang, S., Yang, Z., Dong, Y., Hu, M. and Zhang, P. (2025) Grade: Generative Graph Contrastive Learning for Multimodal Recommendation. Neurocomputing, 657, Article ID: 131630. [Google Scholar] [CrossRef
[32] Zhang, S., Yang, L. and Cheng, Q. (2026) A Multi-Head Mixed Attention Mechanism Enhanced Multimodal Knowledge Graph for Personalized Recommendation. Neurocomputing, 667, Article ID: 132393. [Google Scholar] [CrossRef
[33] Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I. and Welling, M. (2018) Modeling Relational Data with Graph Convolutional Networks. In: Gangemi, A., et al., Eds., The Semantic Web, Springer, 593-607. [Google Scholar] [CrossRef
[34] Zhang, J., Liu, G., Liu, Q., Wu, S. and Wang, L. (2024) Modality-Balanced Learning for Multimedia Recommendation. Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, 28 October-1 November 2024, 7551-7560. [Google Scholar] [CrossRef
[35] Fu, J., Ge, X., Xin, X., Karatzoglou, A., Arapakis, I., Zheng, K., et al. (2025) Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation. IEEE Transactions on Knowledge and Data Engineering, 37, 7076-7089. [Google Scholar] [CrossRef
[36] Wang, Y., Yang, Y., Wu, L., et al. (2025) Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation. arXiv: 2511.18740.
[37] Zhou, H., Zhou, X., Zeng, Z., et al. (2023) A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions. arXiv: 2302.04473.
[38] Zhang, S., Yao, L., Sun, A., et al. (2019) Deep Learning Based Recommender System: A Survey and New Perspectives. ACM Computing Surveys (CSUR), 52, 1-38.
[39] 刘婷, 朱亚峰. 基于机器学习的智能商品推荐系统设计[J]. 中国新技术新产品, 2025(20): 8-11.
[40] 徐昊栋. 基于深度强化学习的商品推荐系统[D]: [硕士学位论文]. 杭州: 杭州电子科技大学, 2025.
[41] 曲照鑫. 基于深度学习的商品推荐算法研究与软件开发[D]: [硕士学位论文]. 沈阳: 沈阳工业大学, 2024.
[42] Liu, Q., Hu, J., Xiao, Y., Zhao, X., Gao, J., Wang, W., et al. (2024) Multimodal Recommender Systems: A Survey. ACM Computing Surveys, 57, 1-17. [Google Scholar] [CrossRef
[43] Joachims, T., Swaminathan, A. and Schnabel, T. (2017) Unbiased Learning-to-Rank with Biased Feedback. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, Cambridge, 6-10 February 2017, 781-789. [Google Scholar] [CrossRef
[44] Liang, W., Zhang, Y., Kwon, Y., Yeung, S. and Zou, J. (2022) Mind the Gap: Understanding the Modality Gap in Multi-Modal Contrastive Representation Learning. arXiv: 2203.02053.
[45] Kang, W. and McAuley, J. (2018) Self-Attentive Sequential Recommendation. 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17-20 November 2018, 197-206. [Google Scholar] [CrossRef
[46] Chen, X., Zhang, Y. and Wen, J.R. (2022) Measuring “WHY” in Recommender Systems: A Comprehensive Survey on the Evaluation of Explainable Recommendation. arXiv: 2202.06466.
[47] Liu, H., Li, C., Wu, Q. and Lee, Y.J. (2023) Visual Instruction Tuning. arXiv: 2304.08485.
[48] Li, J., Li, D., Savarese, S. and Hoi, S. (2023) BLIP-2: Bootstrapping Language-Image Pre-Training with Frozen Image Encoders and Large Language Models. Proceedings of the 40th International Conference on Machine Learning, Honolulu, 23-29 July 2023, 19730-19742.
[49] Deng, Y., Zhang, W., Xu, W., et al. (2023) LLM-Rec: Large Language Models for Sequential Recommendation. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, 24-28 July 2023, 1512-1521.
[50] Lewis, P., Perez, E., Piktus, A., et al. (2020) Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv: 2005.11401.
[51] Rafailov, R., Sharma, A., Mitchell, E., et al. (2023) Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv: 2305.18290.