基于安全车辆路径问题的引导式非自回归知识蒸馏
Guided Non-Autoregressive Knowledge Distillation for Safety-Based Vehicle Routing Problem
摘要: 神经组合优化(Neural Combinatorial Optimization, NCO)通过学习问题实例与其对应解之间的映射关系,在标准路径规划问题中已展现出较强的竞争性能。然而,当路径规划问题在目标函数、偏好或约束条件等方面发生变化时,已有训练模型通常难以直接迁移应用。此外,在此类新问题设定下,获取高质量最优解作为训练数据往往需要付出较高的计算代价,从而限制了从头训练深度模型的实际可行性。文章以基于安全优先策略的容量约束车辆路径规划问题(Capacitated Vehicle Routing Problem with Safety-Based Routing Preference, CVRP-SRP)为研究对象,面向有限数据场景,提出一种由知识蒸馏驱动的神经–启发式优化框架。在已有研究中,神经辅助自适应邻域搜索(Neural-Assisted Adaptive Neighborhood Search, NANS)算法通过结合轻编码–重解码模型(Light Encoder-Heavy Decoder, LEHD)与自适应大邻域搜索(Adaptive Large Neighborhood Search, ALNS),能够生成高质量路径解,并在解的质量与稳定性方面表现出较优性能。然而,该方法依赖迭代式搜索过程,计算开销较大,且在新问题场景下的泛化能力仍存在一定局限。为在有限数据条件下实现高效知识迁移,文章在上述方法基础上引入引导式非自回归知识蒸馏(Guided Non-Autoregressive Knowledge Distillation, GNARKD)策略,构建教师–学生学习框架。具体而言,以在标准CVRP数据上训练得到的LEHD模型作为教师模型,用于学习通用的路径构造规律与决策模式;同时构建轻量化学生LEHD模型,通过知识蒸馏机制对其进行优化,使其在较少目标数据的条件下即可有效适应CVRP-SRP问题。通过该过程,实现了从标准路径规划问题到具有安全约束的新问题设定的高效迁移,并在降低推理复杂度的同时提升模型的应用效率。不同于仅依赖最终解进行学习的传统蒸馏方法,GNARKD同时迁移动作序列与决策概率分布,从而保留了自回归解码过程中所蕴含的顺序相关决策知识。在社区规模路径实例上的实验结果表明,与未采用知识蒸馏及直接迁移策略的模型相比,所提出的框架在解的质量、鲁棒性以及数据利用效率方面均取得了显著提升,为面向安全感知的无人配送车辆路径规划提供了一种有效且具有实际应用价值的解决方案。
Abstract: Neural combinatorial optimization (NCO) has emerged as an effective approach for routing problems by learning from instance-solution pairs and achieving competitive performance on standard formulations. However, when routing problems are reformulated with new objectives, preferences, or constraints, existing trained models are often no longer directly applicable. Moreover, under such new problem settings, obtaining optimal solutions as training data usually incurs high computational cost, which limits the practicality of retraining deep models from scratch. This paper studies the Capacitated Vehicle Routing Problem with Safety-Based Routing Preference (CVRP-SRP) as a representative case and proposes a knowledge-distillation-driven neural-heuristic framework for limited-data scenarios. In existing research, the Neural-Assisted Adaptive Neighborhood Search (NANS) algorithm combines the Light Encoder-Heavy Decoder (LEHD) model with Adaptive Large Neighborhood Search (ALNS) to generate high-quality routing solutions, demonstrating superior performance in terms of solution quality and stability. However, this method relies on iterative search processes with substantial computational overhead, and its generalization capability under new problem scenarios remains somewhat limited. To achieve efficient knowledge transfer under limited-data conditions, this paper introduces a Guided Non-Autoregressive Knowledge Distillation (GNARKD) strategy based on the aforementioned method, constructing a teacher-student learning framework. Specifically, an LEHD model trained on standard CVRP data serves as the teacher model to capture general routing construction patterns and decision-making rules; meanwhile, a lightweight student LEHD model is constructed and optimized through the knowledge distillation mechanism, enabling it to effectively adapt to the CVRP-SRP problem with limited target data. Through this process, efficient transfer from standard routing problems to new problem settings with safety constraints is achieved, while reducing inference complexity and improving model application efficiency. Unlike traditional distillation methods that rely solely on final solutions for learning, GNARKD simultaneously transfers action sequences and decision probability distributions, thereby preserving the sequential decision knowledge embedded in the autoregressive decoding process. Experimental results on community-scale routing instances demonstrate that, compared with models without knowledge distillation and direct transfer strategies, the proposed framework achieves significant improvements in solution quality, robustness, and data utilization efficiency, providing an effective and practically valuable solution for safety-aware unmanned delivery vehicle routing.
文章引用:许伟乐, 潘安琪. 基于安全车辆路径问题的引导式非自回归知识蒸馏[J]. 计算机科学与应用, 2026, 16(5): 287-299. https://doi.org/10.12677/csa.2026.165184

参考文献

[1] Gendreau, M., Hertz, A. and Laporte, G. (1994) A Tabu Search Heuristic for the Vehicle Routing Problem. Management Science, 40, 1276-1290. [Google Scholar] [CrossRef
[2] Clarke, G. and Wright, J.W. (1964) Scheduling of Vehicles from a Central Depot to a Number of Delivery Points. Operations Research, 12, 568-581. [Google Scholar] [CrossRef
[3] Helsgaun, K. (2017) An Extension of the Lin-Kernighan-Helsgaun TSP Solver for Constrained Traveling Salesman and Vehicle Routing Problems. Roskilde University.
[4] Lee, C.Y., Lee, Z.J., Lin, S.W., et al. (2010) An Enhanced Ant Colony Optimization (EACO) Applied to Capacitated Vehicle Routing Problem. Applied Intelligence, 32, 88-95. [Google Scholar] [CrossRef
[5] Tavakkoli-Moghaddam, R., Safaei, N. and Gholipour, Y. (2006) A Hybrid Simulated Annealing for Capacitated Vehicle Routing Problems with the Independent Route Length. Applied Mathematics and Computation, 176, 445-454. [Google Scholar] [CrossRef
[6] Jasim, A.N. and Chaari Fourati, L. (2024) Guided Genetic Algorithm for Solving Capacitated Vehicle Routing Problem with Unmanned-Aerial-Vehicles. IEEE Access, 12, 106333-106358. [Google Scholar] [CrossRef
[7] Hu, K.C., Tsai, C.W. and Chiang, M.C. (2020) A Multiple-Search Multi-Start Framework for Metaheuristics for Clustering Problems. IEEE Access, 8, 96173-96183. [Google Scholar] [CrossRef
[8] Zhao, J., Mao, M., Zhao, X. and Zou, J. (2021) A Hybrid of Deep Reinforcement Learning and Local Search for the Vehicle Routing Problems. IEEE Transactions on Intelligent Transportation Systems, 22, 7208-7218. [Google Scholar] [CrossRef
[9] Herdianto, B., Billot, R., Lucas, F., et al. (2025) Edge-Selector Model Applied for Local Search Neighborhood for Solving Vehicle Routing Problems. arXiv:2508.14071.
[10] Cheng, C.A., Kolobov, A. and Swaminathan, A. (2021) Heuristic-Guided Reinforcement Learning. Advances in Neural Information Processing Systems, 34, 13550-13563.
[11] Lin, X., Liu, F., Luo, F., Wang, Z. and Zhang, Q. (2023) Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization. Advances in Neural Information Processing Systems 36, New Orleans, 10-16 December 2023, 8845-8864. [Google Scholar] [CrossRef
[12] Vidal, T. (2022) Hybrid Genetic Search for the CVRP: Open-Source Implementation and SWAP Neighborhood. Computers & Operations Research, 140, Article 105643. [Google Scholar] [CrossRef
[13] Das, S., Camporese, G., Cheng, S. and Ballan, L. (2024) Distilling Knowledge for Short-to-Long Term Trajectory Prediction. 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, 14-18 October 2024, 13001-13008. [Google Scholar] [CrossRef
[14] Bi, J., Cao, Z., Chee, Y.M., Chen, J., Ma, Y., Sun, Y., et al. (2022) Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation. Advances in Neural Information Processing Systems 35, New Orleans, 28 November-9 December 2022, 31226-31238. [Google Scholar] [CrossRef
[15] Zheng, Y., Luo, F., Wang, Z., Wu, Y. and Zhou, Y. (2025) MTL-KD: Multi-Task Learning via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver. arXiv:2506.02935.
[16] Jin, X., Peng, B., Wu, Y., Liu, Y., Liu, J., Liang, D., et al. (2019) Knowledge Distillation via Route Constrained Optimization. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 1345-1354. [Google Scholar] [CrossRef
[17] Xiao, Y., Wang, D., Li, B., Wang, M., Wu, X., Zhou, C., et al. (2024) Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 20274-20283. [Google Scholar] [CrossRef
[18] Andreoli, J., Drakulic, D., Mai, F., Michel, S. and Sors, A. (2023) BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization. Advances in Neural Information Processing Systems 36, New Orleans, 10-16 December 2023, 77416-77429. [Google Scholar] [CrossRef
[19] Xin, L., Song, W., Cao, Z. and Zhang, J. (2021) Multi-Decoder Attention Model with Embedding Glimpse for Solving Vehicle Routing Problems. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 12042-12049. [Google Scholar] [CrossRef
[20] Kwon, Y.D., Choo, J., Kim, B., et al. (2020) Pomo: Policy Optimization with Multiple Optima for Reinforcement Learning. Advances in Neural Information Processing Systems, 33, 21188-21198.