LLM-NAS:一种基于大型语言模型的神经架构搜索进化框架
LLM-NAS: An Evolutionary Framework for Neural Network Architecture Search Based on Large Language Model
摘要: 可微架构搜索(DARTS)已成为神经架构搜索的一种流行方法,然而,这种方法面临过早收敛到局部最优的情形。同时大语言模型(LLMs)已经成为能够完成广泛任务的强大工具。本文提出了一种基于大型语言模型的进化搜索框架LLM-NAS,用于神经架构搜索。它将LLM作为黑箱生成新的架构,允许在架构搜索过程中探索各种优化方向。在不同数据集和搜索空间上进行的大量实验表明,所提出的方法取得了与最先进的技术相当甚至更优的性能。
Abstract: Differentiable Architecture Search (DARTS) has become a popular method for Neural Architecture Search (NAS). However, this method faces the problem of premature convergence to local optima. Meanwhile, Large Language Models (LLMs) have evolved into powerful tools capable of accomplishing a wide range of tasks. This paper proposes an evolutionary search framework called LLM-NAS based on LLMs for Neural Architecture Search. It employs LLMs as a black box to generate new architectures, allowing the exploration of various optimization directions during the architecture search process. Extensive experiments conducted on different datasets and search spaces demonstrate that the proposed method achieves performance comparable to or even superior to state-of-the-art techniques.
文章引用:卢烨, 陈磊. LLM-NAS:一种基于大型语言模型的神经架构搜索进化框架[J]. 计算机科学与应用, 2026, 16(2): 405-413. https://doi.org/10.12677/csa.2026.162069

参考文献

[1] Zoph, B. and Le, Q.V. (2016) Neural Architecture Search with Reinforcement Learning.
[2] Real, E., Aggarwal, A., Huang, Y. and Le, Q.V. (2019) Regularized Evolution for Image Classifier Architecture Search. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 4780-4789. [Google Scholar] [CrossRef
[3] Liu, H., Simonyan, K. and Yang Y. (2018) DARTs: Differentiable Architecture Search. arXiv preprint arXiv:1806.09055.
https://arxiv.org/abs/1806.09055
[4] Zela, A., Elsken, T., Saikia, T., et al. (2019) Understanding and Robustifying Differentiable Architecture Search.
[5] Chen, X., Xie, L., Wu, J. and Tian, Q. (2019) Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 1294-1303. [Google Scholar] [CrossRef
[6] Chen, X. and Hsieh, C.J. (2020) Stabilizing Differentiable Architecture Search via Perturbation-Based Regularization. 2020 International Conference on Machine Learning, Vienna, 18 July 2020, 1554-1565.
[7] Lu, Z., Whalen, I., Boddeti, V., et al. (2019) NSGA-Net: Neural Architecture Search Using Multi-Objective Genetic Algorithm. Proceedings of the Genetic and Evolutionary Computation Conference, Prague, 13-17 July 2019, 419-427.
[8] Zheng, M., Su, X., You, S., et al. (2023) Can GPT-4 Perform Neural Architecture Search? arXiv preprint arXiv:2304.10970.
http://arxiv.org/abs/2304.10970
[9] Achiam, J., Adler, S., Agarwal, S., et al. (2023) GPT-4 Technical Report. arXiv preprint arXiv:2303.08774.
https://arxiv.org/abs/2303.08774
[10] Yu, C., Liu, X., Wang, Y., et al. (2023) GPT-NAS: Evolutionary Neural Architecture Search with the Generative Pre-trained Model.
[11] Cai, Z., Chen, L., Liu, P., Ling, T. and Lai, Y. (2024) EG-NAS: Neural Architecture Search with Fast Evolutionary Exploration. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 11159-11167. [Google Scholar] [CrossRef
[12] Krizhevsky, A. and Hinton, G. (2009) Learning Multiple Layers of Features from Tiny Images.
[13] Deng, J., Dong, W., Socher, R., Li, L., Li, K. and Li, F.F. (2009) ImageNet: A Large-Scale Hierarchical Image Database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, 20-25 June 2009, 248-255. [Google Scholar] [CrossRef
[14] Dong, X. and Yang, Y. (2020) Nas-Bench-201: Extending the Scope of Reproducible Neural Architecture Search.
[15] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[16] Pham, H., Guan, M., Zoph, B., et al. (2018) Efficient Neural Architecture Search via Parameters Sharing. International Conference on Machine Learning, Stockholm, 10-15 July 2018, 4095-4104.
[17] Dong, X. and Yang, Y. (2019) Searching for a Robust Neural Architecture in Four GPU Hours. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 1761-1770. [Google Scholar] [CrossRef
[18] Chu, X., Wang, X., Zhang, B., et al. (2020) DARTS−: Robustly Stepping out of Performance Collapse without Indicators. arXiv preprint arXiv:2009.01027.
https://arxiv.org/abs/2009.01027
[19] Chen, X., Wang, R., Cheng, M., et al. (2020) Drnas: Dirichlet Neural Architecture Search. arXiv preprint arXiv:2006.10355.
https://arxiv.org/abs/2006.10355
[20] Xie, S., Zheng, H., Liu, C., et al. (2018) SNAS: Stochastic Neural Architecture Search. arXiv preprint arXiv:1812.09926.
https://arxiv.org/abs/1812.09926
[21] Yuan, G., Wang, B., Xue, B. and Zhang, M. (2024) Particle Swarm Optimization for Efficiently Evolving Deep Convolutional Neural Networks Using an Autoencoder-Based Encoding Strategy. IEEE Transactions on Evolutionary Computation, 28, 1190-1204. [Google Scholar] [CrossRef
[22] Dong, X. and Yang, Y. (2019) One-Shot Neural Architecture Search via Self-Evaluated Template Network. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 3681-3690. [Google Scholar] [CrossRef
[23] Hu, S., Xie, S., Zheng, H., Liu, C., Shi, J., Liu, X., et al. (2020). DSNAS: Direct Neural Architecture Search without Parameter Retraining. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 12084-12092.[CrossRef
[24] Zhang, M., Su, S.W., Pan, S., et al. (2021) iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients. 2021 International Conference on Machine Learning, Online, 18-24 July 2021, 12557-12566.