AIRR  >> Vol. 6 No. 3 (August 2017)

    基于分层学习的自适应动态规划
    Adaptive Dynamic Programming Based on Hierarchical Learning

  • 全文下载: PDF(451KB) HTML   XML   PP.91-96   DOI: 10.12677/AIRR.2017.63010  
  • 下载量: 766  浏览量: 1,878   科研立项经费支持

作者:  

林巧,李旻朔:浙江师范大学,数理与信息工程学院,浙江 金华

关键词:
认知发育自适应动态规划神经网络分层学习Cognitive Adaptive Dynamic Programming Neural Network Hierarchical Learning

摘要:

本文基于婴儿的认知发育模型LOC (Levels of Consciousness)提出了基于分层学习的自适应动态规划方法以改进学习和优化。根据LOC模型中感知的层次性以及工作目标的层次定义,为自适应动态规划设计了多层的目标网络结构及相应的分层学习方法。在自适应评价中引入多层的目标表征将引导系统做出好的决策并最终实现目标。文中给出了分层自适应动态规划的系统结构、学习和自适应过程,并通过模拟系统GLD (Green Light Domain),在自适应交通信号控制模拟实验上验证了该方法的学习和控制能力。

This paper introduces an adaptive dynamic program method based on hierarchical learning. The motivations for this idea come from the levels of consciousness (LOC) model, which address the interdependence between consciousness and action in baby’s development. The introduction of a multilevel goal representation into the adaptive critic is able to guide the system’s decision-making to accomplish the long-term goal over time, mimicking certain levels of brain-like intelligence. The detailed system architecture, learning and adaption procedure are presented, and the learning and control capability of this approach is verified through light control in GLD (Green Light Domain).

文章引用:
林巧, 李旻朔. 基于分层学习的自适应动态规划[J]. 人工智能与机器人研究, 2017, 6(3): 91-96. https://doi.org/10.12677/AIRR.2017.63010

参考文献

[1] Werbos, P.J. (2009) Intelligence in the Brain: a Theory of How It Works and How It Build It. Neural Networks, 22, 200-212.
https://doi.org/10.1016/j.neunet.2009.03.012
[2] Bellman, R.E. (1957) Dynamic Programming. Princeton University Press, Princeton.
[3] Enns, R. and Si, J. (2004) Helicopter Flight Control Using Direct Neural Dynamic Programming, Handbook of Learn-ing and Approximate Dynamic Programming. IEEE Transcations on Neural Networks, 14, 535-559.
[4] Fu, J., He, H. and Zhou. X. (2011) Adaptive Learning and Control for Mimo System Based on Adaptive Dynamic Programming. IEEE Transcations on Neural Networks, 22, 1133-1148.
https://doi.org/10.1109/TNN.2011.2147797
[5] Hen, C.C. (2007) An Approximate Dynamic Pro-gramming Strategy for Responsive Traffic Signal Control. Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Pro-gramming and Reinforcement Learning, Honolulu, 1-5 April 2007, 303-310.
[6] Zhang, H.G., Wei, Q.L. and Luo, Y.H. (2008) A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm. IEEE Transaction on System, Man and Cybernetics, 38, 937-942.
https://doi.org/10.1109/TSMCB.2008.920269
[7] Zelazo, P.D. (2004) The Development of Conscious Control in Childhood. TRENDS in Cognitive Sciences, 8, 12-17.
https://doi.org/10.1016/j.tics.2003.11.001
[8] Lebiere, C. (1998) The Dynamics of Cognition: An ACT-R Model of Cognitive Arithmetic. Carnegie Mellon University, Pittsburgh.
[9] Ron, S. (2012) Memory Systems within a Cognitive Architecture. New Ideas in Psychology, 30, 227-240.
https://doi.org/10.1016/j.newideapsych.2011.11.003
[10] Prokhorov, D.V. and Wunsch, D.C. (1997) Adaptive Critic Designs. IEEE Transactions on Neural Networks, 8, 997- 1007.
https://doi.org/10.1109/72.623201
[11] Werbos, P.J. (1992) Neural Con-trol and Supervised Learning: An Overview and Evaluation, Handbook of Intelligent Control. Van Nostrand, New York.