CSA  >> Vol. 6 No. 11 (November 2016)

    Adaptive Scheduling Strategy for Heterogeneous Spark Cluster

  • 全文下载: PDF(1734KB) HTML   XML   PP.692-704   DOI: 10.12677/CSA.2016.611084  
  • 下载量: 1,063  浏览量: 3,555  


徐佳俊,刘功申,苏 波,孟 魁:上海交通大学,上海

Spark异构集群调度策略Spark Heterogeneous Cluster Scheduling Strategy



The scheduling strategy of Spark assumes that cluster is homogenized. However, as the change or update of hardware in cluster, it becomes more and more heterogeneous. Thus, the original scheduling strategy cannot meet the performance requirement anymore and short board effect gradually emerges. The paper proposes a new strategy to solve this problem. The new strategy refers the idea of hierarchical scheduling. It combines the task complexity, worker performance and worker CPU usage as its scheduling factors to improve the scheduling performance. And ex-periments show that the new strategy is absolutely effective.

徐佳俊, 刘功申, 苏波, 孟魁. 基于Spark的异构集群调度策略研究[J]. 计算机科学与应用, 2016, 6(11): 692-704. http://dx.doi.org/10.12677/CSA.2016.611084


[1] Dean, J. and Ghemawat, S. (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51, 107-113.
[2] Borthakur, D. (2007) The Hadoop Distributed File System: Architecture and Design. Hadoop Project Website, 11, 21.
[3] Zaharia, M., Chowdhury, M., Franklin, M.J., et al. (2010) Spark: Cluster Computing with Working Sets. HotCloud, 10, 10.
[4] Zaharia, M., Chowdhury, M., Das, T., et al. (2012) Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, 2.
[5] 杨志伟, 郑烇, 王嵩, 等. 异构Spark集群下自适应任务调度策略[J]. 计算机工程, 2016, 42(1): 31-35, 40.
[6] Thakur, S., Singh, R. and Sharma, S. (2015) Dynamic Capacity Scheduling in Hadoop. International Journal of Computer Applications, 125.
[7] Zaharia, M. (2009) Job Scheduling with the Fair and Capacity Schedulers. Hadoop Summit, 9.
[8] Zaharia, M., Borthakur, D., Sen Sarma, J., et al. (2010) Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. Proceedings of the 5th European Conference on Computer Systems, ACM, 265- 278.
[9] Nightingale, E.B., Chen, P.M. and Flinn, J. (2005) Speculative Execution in a Distributed File System. ACM SIGOPS Operating Systems Review, ACM, 39, 191-205.
[10] Zaharia, M., Konwinski, A., Joseph, A.D., et al. (2008) Improving MapReduce Performance in Heterogeneous Environments. OSDI, 8, 7.
[11] Yong, M., Garegrat, N. and Mohan, S. (2009) Towards a Resource Aware Scheduler in Hadoop. Proceeding of ICWS, 102-109.
[12] Tang, Z., Zhou, J., Li, K., et al. (2013) A MapReduce Task Scheduling Algorithm for Deadline Constraints. Cluster Computing, 16, 651-662.
[13] Xu, X., Cao, L. and Wang, X. (2014) Adaptive Task Scheduling Strategy Based on Dynamic Workload Adjustment for Heterogeneous Hadoop Clusters.