基于高可用集群的EPICS IOC冗余系统设计与实现
Design and Implementation of an EPICS IOC Redundancy System Based on High-Availability Clusters
DOI: 10.12677/csa.2025.1511278, PDF,    国家科技经费支持
作者: 徐国顺, 黄子涵:安徽理工大学计算机科学与工程学院,安徽 淮南;合肥综合性国家科学中心能源研究院(安徽省能源实验室),安徽 合肥;张祖超*:合肥综合性国家科学中心能源研究院(安徽省能源实验室),安徽 合肥;中国科学院合肥物质科学研究院等离子体物理研究所,安徽 合肥;田腾飞:中国科学院合肥物质科学研究院等离子体物理研究所,安徽 合肥;张 杰:安徽理工大学计算机科学与工程学院,安徽 淮南;中国科学院合肥物质科学研究院等离子体物理研究所,安徽 合肥
关键词: EPICSIOC冗余高可用集群Pacemaker数据同步控制系统EPICS IOC Redundancy High Availability Cluster Pacemaker Data Synchronization Control System
摘要: 在大型科研装置与工业控制系统中,EPICS架构因其分布式、高实时性的特性被广泛应用,其中IOC作为关键组件,其运行稳定性直接影响控制系统的安全性与数据完整性。针对IOC可能出现的宕机、网络中断等故障风险,本文设计并实现了一套基于高可用集群的EPICS IOC冗余系统。该系统以Pacemaker与Corosync构建集群调度框架,结合ETCD的分布式锁机制与DRBD的块级数据同步能力,实现主备节点的自动切换与数据一致性保障。同时,引入Autosave与Archiver Appliance工具对IOC状态及历史数据进行实时保存与归档。经多组故障模拟实验验证,系统具备良好的容错能力与稳定性,IOC故障切换时长控制在30秒以内,PV数据持续性良好,验证了系统在高可靠性控制应用中的可行性与稳定性。
Abstract: EPICS (Experimental Physics and Industrial Control System) has been widely adopted in large-scale scientific facilities and industrial control systems due to its distributed architecture and real-time performance. As a core component, the IOC (Input/Output Controller) plays a critical role, and its operational stability directly impacts the system’s safety and data integrity. To address potential failures such as controller crashes or network interruptions, this paper proposes and implements a high-availability EPICS IOC redundancy system. The system integrates Pacemaker and Corosync to form a cluster scheduling framework, uses ETCD for distributed locking, and leverages DRBD for block-level data synchronization, enabling automatic failover between the primary and backup nodes while ensuring data consistency. Additionally, Autosave and Archiver Appliance tools are employed to persist IOC states and archive historical data in real time. Experimental validation under multiple fault scenarios demonstrates the system’s robust failover capability and stability. The switchover time between nodes can be controlled within 30 seconds, with no significant interruption in PV data, making the system suitable for high-reliability control applications.
文章引用:徐国顺, 张祖超, 田腾飞, 黄子涵, 张杰. 基于高可用集群的EPICS IOC冗余系统设计与实现[J]. 计算机科学与应用, 2025, 15(11): 10-18. https://doi.org/10.12677/csa.2025.1511278

参考文献

[1] 尹聪聪, 韩利峰, 李勇平, 等. EPICS IOC冗余技术研究[J]. 自动化仪表, 2013, 34(12): 73-75+79.
[2] 余强, 周芷伟, 袁恺, 等. CRAFT 1 kW@4.5 K氦制冷机控制系统开发与应用[J]. 低温与超导, 2025, 53(6): 1-8.
[3] 尹聪聪. 钍基熔盐堆EPICS控制系统可靠性技术应用研究[D]: [博士学位论文]. 上海: 中国科学院研究生院(上海应用物理研究所), 2015.
[4] 黄子滪. PROFINET在加速器控制系统中的应用研究[D]: [博士学位论文]. 合肥: 中国科学技术大学, 2019.
[5] 李宇鲲, 杜垚垚, 叶强, 等. 基于高可用集群的服务化EPICS与数据处理方式[J]. 强激光与粒子束, 2025, 37(1): 81-87.
[6] Li, X., Li, Y., Shao, R., et al. (2026) Multigroup Cross-Section Library Generation for Deterministic Electron-Transport Calculation. Annals of Nuclear Energy, 227, 111937-111937. [Google Scholar] [CrossRef
[7] 吴汉楠, 岳敏, 马涛, 等. 荧光靶历史图像数据存储系统设计与实现[J]. 强激光与粒子束, 2024, 36(7): 73-79.
[8] Bielewicz, M., Mazerewicz, P., Szewiński, J., et al. (2025) The New Precise Positioning System of the Heavy Hadron Calorimeter FPSD in the NA61/SHINE Experiment Based on the Siemens 1200 Controller Connected with the EPICS Software. Electronics, 14, 3961-3961.
[9] 薛康佳, 张玉亮, 王林, 等. CSNS加速器服务器监控系统设计[J]. 核电子学与探测技术, 2023, 43(2): 404-408.
[10] 雷蕾, 韩利峰, 徐海霞, 等. EPICS环境下的软件规范管理[J]. 核技术, 2015, 38(6): 74-79.