基于双向流统一标识与隐式状态机的TCP流追踪算法研究
A TCP Flow Tracking Algorithm Based on Bidirectional Flow Unified Identification and Implicit State Machines
摘要: 随着网络流量的爆发式增长和复杂性的不断提升,高效、准确地对TCP流进行追踪和重组成为网络流量分析中的关键任务。现有技术在网络流量分析中存在明显不足,包括多字节编码解析能力有限和流追踪效率低下的问题。为此,本文提出了一种基于双向流统一标识与隐式状态机的TCP流追踪算法。该算法通过设计双向无关性流表管理机制,解决了传统五元组流表管理方法中双向流冗余的问题,将流表空间利用率从50%提升至100%;同时,利用隐式状态机实现跨包分片的UTF-8字符边界精确检测,解析准确率达99.8%。实验结果表明,该算法在处理10 Gbps流量时,CPU利用率显著降低,且在解析准确率和处理效率方面均优于现有工具。然而,该算法在面对具有高度动态性和不确定性的复杂网络环境时,其稳定性和可靠性还有待进一步提高,尤其是在面对大规模、高并发的网络流量时,算法的扩展性和适应性仍需进一步优化。
Abstract: With the explosive growth and increasing complexity of network traffic, efficient and accurate tracking and reassembly of TCP flows have become critical tasks in network traffic analysis. Existing technologies exhibit significant shortcomings in network traffic analysis, including limited capabilities for parsing multi-byte encodings and low efficiency in flow tracking. To address these issues, this paper proposes a TCP flow tracking algorithm based on Bidirectional Flow Unified Identification and an Implicit State Machine. The algorithm resolves the issue of bidirectional flow redundancy inherent in traditional five-tuple flow table management methods by designing a Bidirectional-Independent Flow Table Management Mechanism. This mechanism increases flow table space utilization from 50% to 100%. Simultaneously, the algorithm employs the implicit state machine to achieve precise detection of UTF-8 character boundaries across packet fragments, achieving a parsing accuracy rate of 99.8%. Experimental results demonstrate that when processing 10 Gbps traffic, this algorithm significantly reduces CPU utilization and exhibits superior performance in both parsing accuracy and processing efficiency. However, the stability and reliability of the algorithm in highly dynamic and uncertain complex network environments require further improvement. Specifically, its scalability and adaptability still need optimization when handling large-scale, high-concurrency network traffic.
文章引用:张文诚, 李超, 毕修瑜, 段君毅, 陈柳廷. 基于双向流统一标识与隐式状态机的TCP流追踪算法研究[J]. 软件工程与应用, 2025, 14(4): 842-854. https://doi.org/10.12677/sea.2025.144074

参考文献

[1] Cisco (2023) Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2023-2028. Cisco Systems.
https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white-paper-c11-741490.html
[2] Wireshark (2023) Wireshark UTF-8 Parsing Error Analysis.
https://gitlab.com/wireshark/wireshark/-/wikis/Development/Character-encodings
[3] 金显贺, 王昌长, 王忠东, 等. 一种用于在线检测局部放电的数字滤波技术[J]. 清华大学学报(自然科学版), 1993, 33(4): 62-67.
[4] 刘昌明. 21世纪中国水资源问题的战略[M]. 北京: 科学出版社, 1996.
[5] 崔淼, 欧阳桃花, 徐志. 基于资源演化的跨国公司在华合资企业控制权的动态配置——科隆公司的案例研究[J]. 管理世界, 2013(6): 153-169.
[6] Official Website (2024) PF_RING.
https://www.ntop.org/pf_ring/
[7] Intel. (2025) DPDK Documentation: Data Plane Development Kit 25.07.0-rc1.
https://doc.dpdk.org/guides/
[8] Official Website (2025) Wireshark.
https://www.wireshark.org/
[9] Chen, L., Zhao, Q. and Sun, J. (2019) Sliding Window FSM for Multi-Byte Encoding. Proceedings of the 2019 IEEE International Conference on Computer Communications (INFOCOM 2019), Paris, 29 April-2 May 2019, 257-265.
[10] Kim, H., Lee, J. and Park, K. (2018) Cuckoo Hashing for Efficient Flow Table Management. In: Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2018), USENIX Association, 395-409.
[11] Wang, Y., Zhang, B. and Liu, C. (2020) FPGA-Based Flow Table Lookup Engine. Proceedings of the 2020 IEEE 28th Annual Symposium on High-Performance Interconnects (HOTI 2020), Piscataway, 19-21 August 2020, 11-18.
[12] Li, M., Zhang, H. and Wang, S. (2021) Machine Learning for TCP Sequence Number Prediction. Proceedings of the 2021 ACM SIGCOMM Conference, 23-27 August 2021, 323-337.
[13] Park, S., Kim, J. and Kim, H. (2022) Neural Network for Encoding Recognition. Proceedings of the 2022 IEEE Symposium on Computers and Communications (ISCC 2022), Rhodes, 30 June-3 July 2022, 1-7.
[14] Official Website (2024) Scapy.
https://scapy.net/