基于DQN的电商即时配送双层决策模型
Bi-Level Decision-Making Model for E-Commerce Instant Delivery Based on Deep Q-Network (DQN)
摘要: 随着电子商务的快速发展,城市外卖等即时配送业务已成为支撑线上交易履约与提升城市物流效率的重要环节。尤其在电商订单高频到达与时效要求不断提高的背景下,配送调度面临多主体协同、多约束耦合及多目标优化等复杂问题。传统单层优化模型及启发式算法难以在动态环境下兼顾实时响应能力与系统整体收益优化。针对上述问题,本文构建了一个面向城市电商物流即时配送的双层决策优化模型。上层由平台统筹订单分配与配送路径规划,以最大化系统整体收益;下层刻画配送员接单收益响应机制与消费者履约满意度函数,实现多主体协同约束下的反馈优化结构。在算法层面,引入深度Q网络(DQN)算法,在高维动态状态空间中进行策略学习。仿真实验结果表明,所提方法在平台收益、配送员收入、消费者满意度与订单履约率等关键指标上均优于多种传统启发式算法。敏感性分析进一步验证了模型在不同成本与时效参数变化条件下的鲁棒性与适应能力。
Abstract: With the rapid development of e-commerce, instant delivery services such as urban food delivery have become a critical component in supporting online order fulfillment and improving urban logistics efficiency. Particularly under the conditions of high-frequency order arrivals and increasingly stringent timeliness requirements, delivery scheduling faces complex challenges, including multi-agent coordination, multi-constraint coupling, and multi-objective optimization. Traditional single-layer optimization models and heuristic algorithms often struggle to balance real-time responsiveness with overall system profit maximization in dynamic environments. To address these challenges, this study develops a bi-level decision-making optimization model for urban e-commerce instant delivery systems. At the upper level, the platform coordinates order allocation and delivery route planning to maximize overall system revenue. At the lower level, the model captures couriers’ order acceptance behavior and consumers’ fulfillment satisfaction through utility-based response mechanisms, forming a feedback-driven optimization structure under multi-agent interactions. Algorithmically, a Deep Q-Network (DQN) approach is introduced to learn optimal scheduling strategies within a high-dimensional dynamic state space. Simulation results demonstrate that the proposed method outperforms several traditional heuristic algorithms in terms of platform revenue, courier income, consumer satisfaction, and order fulfillment rate. Sensitivity analysis further confirms the robustness and adaptability of the model under varying cost and time-related parameter settings.
文章引用:俞梦楠, 袁鹏程. 基于DQN的电商即时配送双层决策模型[J]. 电子商务评论, 2026, 15(5): 245-255. https://doi.org/10.12677/ecl.2026.155512

参考文献

[1] 谢舒婷, 李金碧, 邓万琼, 等. 基于碳减排规制的城市生鲜农产品冷链物流配送路径优化研究[J]. 中国市场, 2026(5): 168-172.
[2] Leelertkij, T., Buddhakulsomsiri, J. and Huynh, V. (2025) A Multi-Thread Simulated Annealing for Multi-Objective Vehicle Routing Problem with Time Windows and Demand Priority. Computers & Industrial Engineering, 207, Article ID: 111253. [Google Scholar] [CrossRef
[3] Luo, H., Liang, Z., Zhu, M., Hu, X. and Wang, G. (2018) Integrated Optimization of Unmanned Aerial Vehicle Task Allocation and Path Planning under Steady Wind. PLOS ONE, 13, e0194690. [Google Scholar] [CrossRef] [PubMed]
[4] Tang, Y., Zhou, J., Hao, H., Hao, F. and Xu, H. (2022) Path Planning and Trajectory Tracking for Automatic Guided Vehicles. Computational Intelligence and Neuroscience, 2022, Article ID: 8981778. [Google Scholar] [CrossRef] [PubMed]
[5] Ma, J., Xu, M., Meng, Q. and Cheng, L. (2020) Ridesharing User Equilibrium Problem under OD-Based Surge Pricing Strategy. Transportation Research Part B: Methodological, 134, 1-24. [Google Scholar] [CrossRef
[6] 王健, 王慧, 胡晓伟, 等. 高峰期考虑乘客议价的网约车定价与平台收益及社会福利优化[J]. 交通运输系统工程与信息, 2022, 22(2): 54-63.
[7] Hou, L., Xu, Y., Ren, R., Yang, J. and Su, L. (2025) Optimization of Three-Dimensional Urban Underground Logistics System Alignment: A Deep Reinforcement Learning Approach. Computers & Industrial Engineering, 205, Article ID: 111185. [Google Scholar] [CrossRef
[8] 俞梦楠, 袁鹏程, 徐凯. 考虑司机与乘客协同效益的拼车服务动态优化策略[J]. 交通运输工程与信息学报, 2025, 23(4): 181-195.