面向电商营销决策的多模态服装销量预测方法研究
Research on Multimodal Clothing Sales Forecasting Methods for E-Commerce Marketing Decision-Making
DOI: 10.12677/ecl.2026.155580, PDF,    国家自然科学基金支持
作者: 代 丽, 杨浩男, 徐靖捷, 王世雄*:浙江理工大学经济管理学院,浙江 杭州
关键词: 多模态融合销量预测电商服装对比学习时序建模Multi-Modal Fusion Sales Forecasting Apparel E-Commerce Contrastive Learning Time-Series Modeling
摘要: 服装行业商品上新频繁、需求波动大,传统仅依赖历史销量序列的预测方法难以准确刻画多因素共同作用下的销量变化规律。为提升预测准确性,本文提出一种基于多模态信息融合的服装销量预测方法,将商品图像、文本描述、用户评价与历史销量等多源数据统一建模,以全面刻画影响销量的关键因素。模型由多模态编码、门控融合、跨模态对齐和GRU解码器四个模块组成。各模态特征分别编码并标准化至统一表示空间;通过门控机制与跨模态注意力进行动态融合;对比学习增强图文时序特征的一致性;最终通过GRU完成未来多步销量预测。实验表明,该方法在MAE、RMSE、MAPE与R2等指标上均显著优于基线模型;多步预测结果显示,模型在短期补货及中期库存规划任务中具备更高稳定性。消融实验进一步验证了门控融合与对比学习模块对提升性能的重要性。研究结果得出,多模态融合能够有效弥补结构化数据表达不足,为服装行业的补货策略与库存管理提供更可靠的预测支持。
Abstract: The clothing industry is characterized by frequent product updates and significant demand fluctuations, and traditional forecasting methods relying solely on historical sales sequences struggle to accurately capture sales variation patterns under the combined influence of multiple factors. To improve prediction accuracy, this paper proposes a multimodal clothing sales forecasting method based on multimodal information fusion, which jointly models multiple data sources, including product images, textual descriptions, user reviews, and historical sales data, to comprehensively characterize the key factors affecting sales. The proposed model consists of four modules: multimodal encoding, gated fusion, cross-modal alignment, and a GRU decoder. Features from different modalities are separately encoded and normalized into a unified representation space; dynamic fusion is achieved through a gating mechanism and cross-modal attention; contrastive learning is employed to enhance the consistency of image, text, and temporal features; and finally, a GRU decoder is utilized to perform future multi-step sales forecasting. Experimental results show that the proposed method significantly outperforms baseline models in terms of MAE, RMSE, MAPE, and R². Multi-step forecasting results further demonstrate that the model exhibits higher stability in short-term replenishment and medium-term inventory planning tasks. Ablation studies further verify the importance of the gated fusion and contrastive learning modules in improving predictive performance. The findings indicate that multimodal fusion can effectively compensate for the limitations of structured data representation and provide more reliable forecasting support for replenishment strategies and inventory management in the clothing industry.
文章引用:代丽, 杨浩男, 徐靖捷, 王世雄. 面向电商营销决策的多模态服装销量预测方法研究[J]. 电子商务评论, 2026, 15(5): 805-813. https://doi.org/10.12677/ecl.2026.155580

参考文献

[1] Fisher, M. and Raman, A. (1996) Reducing the Cost of Demand Uncertainty through Accurate Response to Early Sales. Operations Research, 44, 87-99. [Google Scholar] [CrossRef
[2] Cachon, G.P. and Swinney, R. (2011) The Value of Fast Fashion: Quick Response, Enhanced Design, and Strategic Consumer Behavior. Management Science, 57, 778-795. [Google Scholar] [CrossRef
[3] Skenderi, E., Joppi, R., Fiedler, J., et al. (2021) Well Googled Is Half Done: Fashion Product Image Search Trends and Sales Forecasting. arXiv Preprint.
[4] Yang, T. (2021) Multimodal Fashion Sales Forecasting Using Google Trends and Image Features. Journal of Retailing and Consumer Services, 63, Article 102685.
[5] Li, X., Shen, J., Wang, D., Lu, W. and Chen, Y. (2024) Multi-Modal Transform-Based Fusion Model for New Product Sales Forecasting. Engineering Applications of Artificial Intelligence, 133, 108606. [Google Scholar] [CrossRef
[6] 徐琪. 基于ARIMA模型与随机森林组合的零售服装动态销售预测[J]. 中国管理信息化, 2022, 25(6): 100-104.
[7] Box, G. and Jenkins, G. (1976) Time Series Analysis: Forecasting and Control. Holden-Day.
[8] Taylor, S.J. and Letham, B. (2018) Forecasting at Scale. The American Statistician.
[9] Chen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794. [Google Scholar] [CrossRef
[10] 韩亚娟, 高欣. 基于机器学习组合模型的电商商品销量预测[J]. 计算机系统应用, 2022, 31(1): 315-321.
[11] Radford, A., Kim, J.W., Hallacy, C., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision.
[12] Chen, Y., Li, L., Yu, L., El Kholy, A., Ahmed, F., Gan, Z., et al. (2020) UNITER: Universal Image-Text Representation Learning. In: Lecture Notes in Computer Science, Springer International Publishing, 104-120. [Google Scholar] [CrossRef
[13] 潘志松, 韩笑, 黎维. 基于深度学习的时间序列预测方法综述[J]. 南京航空航天大学学报(自然科学版), 2025, 57(5): 799-821.
[14] 陈嘉俊, 刘波, 林伟伟, 等. 基于Transformer的时间序列预测方法综述[J]. 计算机科学, 2025, 52(6): 96-105.
[15] 马翌硕, 张光南, 刘亚婷, 等. 视觉-语言模型研究综述[J]. 计算机技术与发展, 2026, 36(3): 1-10.
[16] 张浩宇, 王天保, 李孟择, 等. 视觉语言多模态预训练综述[J]. 中国图象图形学报, 2022, 27(9): 2652-2682.
[17] 张虎成, 李雷孝, 刘东江. 多模态数据融合研究综述[J]. 计算机科学与探索, 2024, 18(10): 2501-2520.
[18] 毛远宏, 孙琛琛, 徐鲁豫, 等. 基于深度学习的时间序列预测方法综述[J]. 微电子学与计算机, 2023, 40(4): 8-17.
[19] Hyndman, R.J. and Athanasopoulos, G. (2021) Forecasting: Principles and Practice. 3rd Edition, OTexts.
[20] Ke, G., Meng, Q., Finley, T., et al. (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, 4-9 December 2017, 3149-3157.
[21] Thomassey, S. and Fiordaliso, A. (2006) A Hybrid Sales Forecasting System Based on Clustering and Decision Trees. Decision Support Systems, 42, 408-421. [Google Scholar] [CrossRef
[22] Bandara, K., Bergmeir, C. and Smyl, S. (2020) Forecasting across Time Series Databases Using Recurrent Neural Networks on Groups of Similar Series: A Clustering Approach. Expert Systems with Applications, 140, Article 112896. [Google Scholar] [CrossRef
[23] Wu, W., Li, Y., Wang, Z., et al. (2025) Dual-Forecaster: Learning to Forecast Multimodal Time Series. arXiv.
[24] Zhang, Y., Li, T., Yin, Z., et al. (2024) Contraformer: Contrastive Multimodal Transformer. arXiv.
[25] Lim, B., Arık, S.Ö., Loeff, N. and Pfister, T. (2021) Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting. International Journal of Forecasting, 37, 1748-1764. [Google Scholar] [CrossRef
[26] Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., et al. (2021) Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 11106-11115. [Google Scholar] [CrossRef
[27] 石闻达, 杜劲松, 李笛出乘. 基于层次化多模态注意力机制循环神经网络的服装新品销售预测(英文) [J]. 东华大学学报(英文版), 2024, 41(1): 21-27.
[28] Lee, G., Yu, W., Cheng, W., et al. (2024) MoAT: Multi-Modal Augmented Time Series Forecasting. Proceedings of International Conference on Learning Representations, OpenReview.
[29] 易文龙, 黄暄, 刘木华, 等. 基于分层动态邻域的多模态电商特色水果评价情感分析方法[J]. 农业工程学报, 2025, 41(19): 206-217.