基于Init-Less模式的Serverless冷启动优化方法研究
Research on Serverless Cold Start Optimization Methods Based on Init-Less Mode
DOI: 10.12677/CSA.2021.1112317, PDF,   
作者: 刘 畅, 田春岐:同济大学计算机科学与技术系,上海;王 伟:华东师范大学数据科学与工程学院,上海
关键词: 无服务器计算容器快照还原热迁移Serverless Computing Container Snapshot Restore Live Migration
摘要: 无服务器计算(Serverless)的兴起正为开发人员提供了更加有效的成本节约和弹性计算能力,该模式极大地提高了计算资源分配的灵活性,同时为真正按需租赁计算资源提供了可能。使得用户不必参与资源调度,自动实现无任何计算需求时不占用算力,同时在计算量攀升时及时扩容保证响应的及时性。然而目前基于容器虚拟化技术的无服务计算额外带来了冷启动问题,容器启动以及用户代码初始化会造成数秒的响应延迟,云计算厂商通常建议通过预留实例缓解此问题,但是这并不能消除容器扩容时的冷启动。同时,预留实例的方案也减弱了无服务计算相较于微服务节省成本的优势。现阶段多数研究目标集中于通过改造底层容器技术以尽量减少冷启动耗时,但是相对于广泛采用的Docker方案,难以在实际应用场景中对其进行取代。另一方面,仅仅对底层容器进行改造,无法减少用户代码初始化的额外耗时。本文分析了在无服务器计算场景下常规Docker容器应用的开发模式,提出了基于init-less和lazy-restore策略的Docker容器冷启动优化方法。该方法对用户代码的网络、文件、内存等计算资源的使用模式进行约定,基于CRIU技术捕获已初始化完毕的Docker容器的快照,并通过两阶段lazy-restore对常规Docker容器的启动流程进行替代。根据上述方法实现了docker-initless,绕过了容器应用的启动时间瓶颈,极大降低了Docker容器的冷启动延迟。实验从内存、文件资源等方面对比了Docker和docker-initless,验证了docker-initless在不额外预留计算资源的条件下对容器冷启动优化的有效性,同时可以保证对现有Serverless方案的兼容性。
Abstract: The rise of Serverless computing (Serverless) is providing developers with more effective cost savings and flexible computing capabilities. This model greatly improves the flexibility of computing resource allocation and at the same time makes it possible to rent computing resources on demand. So that users do not need to participate in resource scheduling, automatically realize that no computing power is occupied when there is no computing demand, and at the same time, the capacity in time is expanded to ensure the timeliness of the response when the computing volume rises. However, the current Serverless computing based on container virtualization technology has addition-ally brought about cold start problems. Container startup and user code initialization will cause a response delay of several seconds. Cloud computing vendors usually recommend using reserved in-stances to alleviate this problem, but cold start cannot be eliminated during container expansion. At the same time, the reserved instance solution also reduces the cost-saving advantage of Serverless compared to microservice. At this stage, most research goals are focused on reducing the time-consuming on cold start by modifying the underlying container technology. But compared to the widely used Docker solution, it is difficult to replace it in actual application scenarios. On the other hand, merely modifying the underlying container cannot reduce the additional time-consuming initialize of user code. This article analyzes the development mode of conventional Docker container applications in Serverless computing scenarios, and proposes a cold start optimization method for Docker containers based on init-less and lazy-restore strategies. This method stipulates the usage patterns of computing resources such as network, files, and memory of user code, captures a snapshot of the initialized Docker container based on CRIU technology, and replaces the startup process of the conventional Docker container through a two-stage lazy-restore. According to the above method, docker-initless is realized, which bypasses the bottleneck of the startup time of container applications, and greatly reduces the cold start time of Docker containers. The experiment compares Docker and docker-initless in terms of memory, file resources, etc., and verifies the effectiveness of docker-initless in optimizing the cold start of containers without additional computing resources, while ensuring compatibility with existing Serverless solutions.
文章引用:刘畅, 田春岐, 王伟. 基于Init-Less模式的Serverless冷启动优化方法研究[J]. 计算机科学与应用, 2021, 11(12): 3136-3147. https://doi.org/10.12677/CSA.2021.1112317

参考文献

[1] Zhao, N., Tarasov, V., Anwar, A., et al. (2019) Slimmer: Weight Loss Secrets for Docker Registries. 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), Milan, 8-13 July 2019, 517-519. [Google Scholar] [CrossRef
[2] Seo, K.T., Hwang, H.S., Moon, I.Y., et al. (2014) Performance Comparison Analysis of Linux Container and Virtual Machine for Building Cloud. Advanced Science and Technology Letters, 66, 105-111. [Google Scholar] [CrossRef
[3] Amazon. AWS Lambda-Serverless Compute.
https://aws.amazon.com/lambda
[4] Microsoft. Azure Functions Serverless Architecture.
https://azure.microsoft.com/en-us/services/functions
[5] Google. Google Cloud Function.
https://cloud.google.com/functions
[6] Jonas, E., Pu, Q., Venkataraman, S., et al. (2017) Occupy the Cloud: Dis-tributed Computing for the 99%. ACM Proceedings of the 2017 Symposium on Cloud Computing, Santa Clara, 24-27 September 2017, 445-451. [Google Scholar] [CrossRef
[7] Lynn, T., Rosati, P., Lejeune, A., et al. (2017) A Preliminary Re-view of Enterprise Serverless Cloud Computing (Function-as-a-Service) Platforms. 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), Hong Kong, 11-14 December 2017, 162-169. [Google Scholar] [CrossRef
[8] Wang, L., Li, M.Y., Zhang, Y.Q., Ristenpart, T. and Swift, M. (2018) Peeking behind the Curtains of Serverless Platforms. 2018 USENIX Annual Technical Conference (USENIX ATC 18), Boston, 11-13 July 2018, 133-146.
[9] Brewer, E. (2015) Kubernetes and the Path to Cloud Native. Proceedings of the Sixth ACM Symposium on Cloud Computing, Kohala Coast, 27-29 August 2015, 167-167. [Google Scholar] [CrossRef
[10] Lloyd, W., Ramesh, S., Chinthalapati, S., et al. (2018) Serverless Computing: An Investigation of Factors Influencing Microservice Performance. 2018 IEEE International Conference on Cloud Engineering (IC2E), Orlando, 17-20 April 2018, 159-169. [Google Scholar] [CrossRef
[11] Boza, E.F. andrade, X., Cedeno, J., et al. (2020) On Implementing Autonomic Systems with a Serverless Computing Approach: The Case of Self-Partitioning Cloud Caches. Computers, 9, 14. [Google Scholar] [CrossRef
[12] Jonas, E., Pu, Q.F., Venkataraman, S., Stoica, I. and Recht, B. (2017) Occupy the Cloud: Distributed Computing for the 99%. Proceedings of the 2017 Symposium on Cloud Compu-ting, Santa Clara, 24-27 September 2017, 445-451. [Google Scholar] [CrossRef
[13] Hellerstein, J.M., Stonebraker, M., Hamilton, J., et al. (2007) Ar-chitecture of a Database System. Foundations and Trends R in Databases, 1, 141-259. [Google Scholar] [CrossRef
[14] Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J.J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., et al. (2013) Spanner: Google’s Globally Distributed Database. ACM Transactions on Computer Systems (TOCS), 31, 8. [Google Scholar] [CrossRef
[15] Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernandez Moctezuma, R.J., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., et al. (2015) The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing. Proceedings of the VLDB Endowment, 8, 1792-1803. [Google Scholar] [CrossRef
[16] Knative: Kubernetes-Based Platform to Build, Deploy, and Man-age Modern Serverless Workloads.
https://cloud.google.com/knative
[17] Oakes, E., Yang, L., Zhou, D., Houck, K., Harter, T., Arpaci-Dusseau, A. and Arpaci-Dusseau, R. (2018) SOCK: Rapid Task Provisioning with Serverless-Optimized Containers. 2018 USENIX Annual Technical Conference (USENIX ATC 18), Boston, 11-13 July 2018, 57-70.
[18] Hellerstein, J.M., Faleiro, J., Gonzalez, J.E., et al. (2019) Serverless Computing: One Step Forward, Two Steps Back. arXiv preprint arXiv:1812.03651
[19] Wagner, T.A. (2018) Acquisition and Maintenance of Compute Capacity, September 4. US Pa-tent 10067801B1.
[20] Docker, M.D. (2014) Lightweight Linux Containers for Consistent Development and Deploy-ment. Linux Journal, 2014, 2.
[21] Baldini, I., Castro, P., Chang, K., et al. (2017) Serverless Computing: Current Trends and Open Problems. In: Research Advances in Cloud Computing, Springer, Singapore, 1-20. [Google Scholar] [CrossRef
[22] CRIU Community (2019) Checkpoint/Restart in Userspace (CRIU).
https://criu.org
[23] Hargrove, P.H. and Duell, J.C. (2006) Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters. Journal of Physics Conference Series, 46, 494. [Google Scholar] [CrossRef
[24] Ansel, J., Arya, K. and Cooperman, G. (2007) DMTCP: Trans-parent Checkpointing for Cluster Computations and the Desktop.
[25] (n.d.) Checkpoint/Restore in gVisor.
https://gvisor.dev/docs/user_guide/checkpoint_restore
[26] Venkatesh, R.S., Smejkal, T., Miloji, D.S. and Gav-rilovska, A. (2019) Fast In-Memory CRIU for Docker Containers. ACM Proceedings of the International Symposium on Memory Systems, Washington DC, 30 September-3 October 2019, 53-65. [Google Scholar] [CrossRef
[27] Gioiosa, R., Sancho, J.C., Jiang, S., Petrini, F. and Davis, K. (2005) Transparent, Incremental Checkpointing at Kernel Level: A Foundation for Fault Tolerance for Parallel Computers. Pro-ceedings of the 2005 ACM/IEEE Conference on Supercomputing, Seattle, 12-18 November 2005, 9.
[28] Li, Y.W. and Lan, Z.L. (2011) FREM: A Fast Restart Mechanism for General Checkpoint/Restart. IEEE Transactions on Computers, 60, 639-652. [Google Scholar] [CrossRef
[29] Plank, J.S., Beck, M., Kingsley, G. and Li, K. (1994) Lib-ckpt: Transparent Checkpointing under Unix. Computer Science Department.
[30] Du, D., Yu, T.Y., Xia, Y.B., Zang, B.Y., Yan, G.L., Qin, C.G., Wu, Q.X. and Chen, H.B. (2020) Catalyzer: Sub-Millisecond Startup for Serverless Com-puting with Initialization-Less Booting. Proceedings of the Twenty-Fifth International Conference on Architectural Sup-port for Programming Languages and Operating Systems, Lausanne, 16-20 March 2020, 467-481.