基于Spark的异构集群调度策略研究
Adaptive Scheduling Strategy for Heterogeneous Spark Cluster
DOI: 10.12677/CSA.2016.611084, PDF, HTML, XML,  被引量 下载: 2,061  浏览: 4,804 
作者: 徐佳俊*, 刘功申, 苏 波, 孟 魁:上海交通大学,上海
关键词: Spark异构集群调度策略Spark Heterogeneous Cluster Scheduling Strategy
摘要: Spark的原生调度策略建立在集群同质化的基本假设上。然而随着硬件的更迭以及高性能硬件的引入,集群异质化现象日趋显著。因此现有的调度策略在异构集群环境下并不高效,短板效应严重。针对这个问题,本文提出了一种新的调度策略以优化Spark在异构集群下的表现。新策略引入了分层调度的思想,调度时综合考量了任务复杂度、节点性能及节点资源使用情况等因素,实现了更加高效公平的任务调度算法。通过仿真和真机实验,证明了新策略的效果相对于原策略有明显提升。
Abstract: The scheduling strategy of Spark assumes that cluster is homogenized. However, as the change or update of hardware in cluster, it becomes more and more heterogeneous. Thus, the original scheduling strategy cannot meet the performance requirement anymore and short board effect gradually emerges. The paper proposes a new strategy to solve this problem. The new strategy refers the idea of hierarchical scheduling. It combines the task complexity, worker performance and worker CPU usage as its scheduling factors to improve the scheduling performance. And ex-periments show that the new strategy is absolutely effective.
文章引用:徐佳俊, 刘功申, 苏波, 孟魁. 基于Spark的异构集群调度策略研究[J]. 计算机科学与应用, 2016, 6(11): 692-704. http://dx.doi.org/10.12677/CSA.2016.611084

参考文献

[1] Dean, J. and Ghemawat, S. (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51, 107-113.
https:/doi.org/10.1145/1327452.1327492
[2] Borthakur, D. (2007) The Hadoop Distributed File System: Architecture and Design. Hadoop Project Website, 11, 21.
[3] Zaharia, M., Chowdhury, M., Franklin, M.J., et al. (2010) Spark: Cluster Computing with Working Sets. HotCloud, 10, 10.
[4] Zaharia, M., Chowdhury, M., Das, T., et al. (2012) Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, 2.
[5] 杨志伟, 郑烇, 王嵩, 等. 异构Spark集群下自适应任务调度策略[J]. 计算机工程, 2016, 42(1): 31-35, 40.
[6] Thakur, S., Singh, R. and Sharma, S. (2015) Dynamic Capacity Scheduling in Hadoop. International Journal of Computer Applications, 125.
https:/doi.org/10.5120/ijca2015906178
[7] Zaharia, M. (2009) Job Scheduling with the Fair and Capacity Schedulers. Hadoop Summit, 9.
[8] Zaharia, M., Borthakur, D., Sen Sarma, J., et al. (2010) Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. Proceedings of the 5th European Conference on Computer Systems, ACM, 265- 278.
[9] Nightingale, E.B., Chen, P.M. and Flinn, J. (2005) Speculative Execution in a Distributed File System. ACM SIGOPS Operating Systems Review, ACM, 39, 191-205.
[10] Zaharia, M., Konwinski, A., Joseph, A.D., et al. (2008) Improving MapReduce Performance in Heterogeneous Environments. OSDI, 8, 7.
[11] Yong, M., Garegrat, N. and Mohan, S. (2009) Towards a Resource Aware Scheduler in Hadoop. Proceeding of ICWS, 102-109.
[12] Tang, Z., Zhou, J., Li, K., et al. (2013) A MapReduce Task Scheduling Algorithm for Deadline Constraints. Cluster Computing, 16, 651-662.
https:/doi.org/10.1007/s10586-012-0236-5
[13] Xu, X., Cao, L. and Wang, X. (2014) Adaptive Task Scheduling Strategy Based on Dynamic Workload Adjustment for Heterogeneous Hadoop Clusters.