基于双延迟深度确定性策略梯度算法的机械臂抗干扰控制
Anti-Interference Control of Manipulator Based on Twin Delayed Deep Deterministic Policy Gradient
DOI: 10.12677/mos.2024.133334, PDF,   
作者: 马淼钒, 黄 勇:上海理工大学机械工程学院,上海;黄渊博:宁波工业互联网研究院有限公司,浙江 宁波
关键词: 抗干扰导纳控制深度强化学习TD3卡尔曼滤波接触力矩Anti-Interference Admittance Control Deep Reinforcement Learning TD3 Kalman Filter Contact Torque
摘要: 针对机械臂运动中和未知环境接触以及建模误差带来的干扰,提出由导纳控制和基于双延迟深度确定性策略梯度算法(Twin Delayed Deep Deterministic Policy Gradient, TD3)的PID控制结合的控制策略。通过导纳控制,将机械臂和环境的接触力矩转化为关节角度以修正期望轨迹,抵消环境带来的扰动。并针对接触力矩测量成本高,提出使用基于卡尔曼滤波的广义动量观测器进行接触力矩的估计。针对内环控制,提出基于TD3的PID位置控制器,TD3是深度确定性策略梯度算法(Deep Deterministic Policy Gradient, DDPG)的改进算法,智能体在不断交互和学习的过程中能够自动调整PID参数,避免了经典PID控制在非线性控制问题中表现较差的缺点。搭建机械臂仿真平台,仿真实验结果表明,所设计的控制策略使机械臂具有良好的抗干扰效果。
Abstract: In order to solve the interference caused by unknown environment contact and modeling error, a control strategy combining admittance control and PID control based on Twin Delayed Deep Deterministic Policy Gradient (TD3) is proposed. The admittance control strategy is used to convert the contact torque between the manipulator and the environment into the joint Angle to correct the expected trajectory and offset the disturbance caused by the environment. In view of the high cost of measuring the contact torque, a generalized momentum observer based on Kalman filter is proposed to estimate the contact torque. For inner loop control, a PID position controller based on TD3 is proposed. TD3 is an improved algorithm based on Deep Deterministic Policy Gradient (DDPG). Agents can automatically adjust PID parameters in the process of continuous interaction and learning. The disadvantages of poor performance of classical PID control in nonlinear control problems are avoided. The simulation platform of the manipulator is built, and the simulation results show that the designed control strategy makes the manipulator have good anti-interference effect.
文章引用:马淼钒, 黄勇, 黄渊博. 基于双延迟深度确定性策略梯度算法的机械臂抗干扰控制[J]. 建模与仿真, 2024, 13(3): 3663-3676. https://doi.org/10.12677/mos.2024.133334

参考文献

[1] Xie, T., Cao, R., Wan, Y. and Sun, S. (2019) Application of Anti-Interference Control in Robot. 2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Suzhou, 29 July 2019-2 August 2019, 142-146. [Google Scholar] [CrossRef
[2] Gao, Z. (2014) On the Centrality of Disturbance Rejection in Automatic Control. ISA Transactions, 53, 850-857. [Google Scholar] [CrossRef] [PubMed]
[3] Phong, L.D., Choi, J. and Kang, S. (2012) External Force Estimation Using Joint Torque Sensors for a Robot Manipulator. 2012 IEEE International Conference on Robotics and Automation, Saint Paul, 14-18 May 2012, 4507-4512. [Google Scholar] [CrossRef
[4] Phong, L.D., Choi, J. and Kang, S. (2013) External Force Estimation Using Joint Torque Sensors and Its Application to Impedance Control of a Robot Manipulator. 2013 13th International Conference on Control, Automation and Systems (ICCAS 2013), Gwangju, 20-23 October 2013, 1794-1798.
[5] Ragaglia, M., Zanchettin, A.M., Bascetta, L. and Rocco, P. (2016) Accurate Sensorless Lead-Through Programming for Lightweight Robots in Structured Environments. Robotics & Computer Integrated Manufacturing: An International Journal of Manufacturing & Product & Process Development, 39, 9-21. [Google Scholar] [CrossRef
[6] Wahrburg, A., Bos, J., Listmann, K.D., Dai, F., Matthias, B. and Ding, H. (2018) Motor-Current-Based Estimation of Cartesian Contact Forces and Torques for Robotic Manipulators and Its Application to Force Control. IEEE Transactions on Automation Science & Engineering, 15, 879-886. [Google Scholar] [CrossRef
[7] Hu, J. and Xiong, R. (2017) Contact Force Estimation for Robot Manipulator Using Semi-Parametric Model and Disturbance Kalman Filter. IEEE Transactions on Industrial Electronics, 65, 3365-3375. [Google Scholar] [CrossRef
[8] Wahrburg, A., Morara, E., Cesari, G., Matthias, B. and Ding, H. (2015) Cartesian Contact Force Estimation for Robotic Manipulators Using Kalman Filters and the Generalized Momentum. IEEE International Conference on Automation Science & Engineering, Gothenburg, 24-28 August 2015, 1230-1235. [Google Scholar] [CrossRef
[9] Liu, Y., Jiang, D., Yun, J., Sun, Y., Li, C., Jiang, G., Kong, J., Tao, B. and Fang, Z. (2022) Self-Tuning Control of Manipulator Positioning Based on FUZZY PID and PSO Algorithm. Frontiers in Bioengineering and Biotechnology, 9, Article ID: 817723. [Google Scholar] [CrossRef] [PubMed]
[10] Sharma, R., Gaur, P. and Mittal, A.P. (2015) Performance Analysis of Two Degree of Freedom Fractional Order PID Controllers for Robotic Manipulator with Payload. ISA Transactions, 58, 279-291. [Google Scholar] [CrossRef] [PubMed]
[11] Bingül, Z. and Karahan, O. (2012) Fractional PID Controllers Tuned by Evolutionary Algorithms for Robot Trajectory Control. Turkish Journal of Electrical Engineering and Computer Sciences, 20, 1123-1136. [Google Scholar] [CrossRef
[12] Sharma, R., Rana, K.P.S. and Kumar, V. (2014) Performance Analysis of Fractional Order FUZZY PID Controllers Applied to a Robotic Manipulator. Expert Systems with Applications, 41, 4274-4289. [Google Scholar] [CrossRef
[13] Ardeshiri, R.R., Khooban, M.H., Noshadi, A., Vafamand, N. and Rakhshan, M. (2020) Robotic Manipulator Control Based on an Optimal Fractional-Order Fuzzy PID Approach: SiL Real-Time Simulation. Soft Computing, 24, 3849-3860. [Google Scholar] [CrossRef
[14] Sutton, R.S. and Barto, A.G. (2018) Reinforcement Learning: An Introduction. 2th Edition, MIT Press, Cambridge, 1-4.
[15] Xiong, H., Ma, T., Zhang, L. and Diao, X. (2020) Comparison of End-to-End and Hybrid Deep Reinforcement Learning Strategies for Controlling Cable-Driven Parallel Robots. Neurocomputing, 377, 73-84. [Google Scholar] [CrossRef
[16] Xu, D., Hui, Z., Liu, Y. and Chen, G. (2019) Morphing Control of a New Bionic Morphing UAV with Deep Reinforcement Learning. Aerospace Science and Technology, 92, 232-243. [Google Scholar] [CrossRef
[17] Gong, L., Wang, Q., Hu, C. and Liu, C. (2020) Switching Control of Morphing Aircraft Based on Q-Learning. Chinese Journal of Aeronautics, 33, 672-687. [Google Scholar] [CrossRef
[18] Huang, Q., Huang, R., Hao, W., Tan, J., Fan, R. and Huang, Z. (2020) Adaptive Power System Emergency Control Using Deep Reinforcement Learning. IEEE Transactions on Smart Grid, 11, 1171-1182. [Google Scholar] [CrossRef
[19] Duan, J., et al. (2020) Deep-Reinforcement-Learning-Based Autonomous Voltage Control for Power Grid Operations. IEEE Transactions on Power Systems, 35, 814-817. [Google Scholar] [CrossRef
[20] Ning, Z., Kwok, R.Y.K., Zhang, K., et al. (2020) Joint Computing and Caching in 5G-Envisioned Internet of Vehicles: A Deep Reinforcement Learning-Based Traffic Control System. IEEE Transactions on Intelligent Transportation Systems, 22, 5201-5212. [Google Scholar] [CrossRef
[21] Hogan, N. (1987) Stable Execution of Contact Tasks Using Impedance Control. Proceedings of 1987 IEEE International Conference on Robotics and Automation, Raleigh, 31 March-3 April 1987, 1047-1054. [Google Scholar] [CrossRef
[22] Jiao, C., Yu, L., Su, X., et al. (2022) Adaptive Hybrid Impedance Control for Dual-Arm Cooperative Manipulation with Object Uncertainties. Automatica, 140, Article ID: 110232. [Google Scholar] [CrossRef
[23] Yu, X.B., Li, B., He, W., et al. (2022) Adaptive-Constrained Impedance Control for Human-Robot Co Transportation. IEEE Transactions on Cybernetics, 52, 13237-13249. [Google Scholar] [CrossRef
[24] Zhao, X.W., Han, S.B., Tao, B., et al. (2021) Model Based Actor-Critic Learning of Robotic Impedance Control in Complex Interactive Environment. IEEE Transactions on Industrial Electronics, 69, 13225-13235. [Google Scholar] [CrossRef
[25] Fujimoto, S., Hoof, H.V. and Meger, D. (2018) Addressing Function Approximation Error in Actor-Critic Methods. International Conference on Machine Learning, Stockholm, 10-15 July 2018, 1802-1810.
https://arxiv.Org/abs/1802.09477
[26] Zhang, H., Ahmad, S. and Liu, G. (2015) Torque Estimation for Robotic Joint with Harmonic Drive Transmission Based on Position Measurements. IEEE Transactions on Robotics, 31, 322-330. [Google Scholar] [CrossRef