基于多核DSP的OpenMp研究与实现
Research and Realization of OpenMp Based on Muiticore DSP
DOI: 10.12677/JISP.2016.54017, PDF, HTML, XML, 下载: 1,813  浏览: 5,722 
作者: 张 琪, 王正勇, 余艳梅:四川大学电子信息学院图像信息研究所,四川 成都
关键词: TMS320C6678OpenMp线性汇编线程调度TMS320C6678 OpenMp Linear Assembly Threads Scheduling
摘要: 针对目前在实际应用场景中多核模型的复杂性,本文在分析了数据流、主从、OpenMp三大多核模型后,选取其中更易实现的OpenMp模型作为主要研究对象。本文研究了OpenMp模型的实现原理,对多核模型OpenMp在嵌套循环中的线程创建以及任务划分所消耗的时间进行了分析,然后研究了OpenMp模型的核数和线程数之间的关系,最终以TI公司的多核DSP TMS320C6678为核心处理器,使用典型的图像处理算法对在以上研究中得出的结论进行了验证。经测试,OpenMp模型在线程数等于核数时执行时间最短。
Abstract: Aiming at the complexity of multicore model in practical application, the three major models of data flow, master-slave and OpenMp are analyzed in this paper. Because the OpenMp model is easier to be implemented, it is taken as the major research object. Firstly, the implementation principle of OpenMp model is studied; the time consumed by the thread creation and task parti-tioning in the nested loop of the OpenMp model is analyzed in this paper. Then, the relationship between the core number and the number of threads in the OpenMp model is also studied. Finally, taking TMS320C6678 multi-core DSP of TI Company as the core processor, a simple image processing algorithm is used to verify the conclusions drawn from the above research. The results of test have proved OpenMp model when the number of threads is equal to the core number shortest execution time.
文章引用:张琪, 王正勇, 余艳梅. 基于多核DSP的OpenMp研究与实现[J]. 图像与信号处理, 2016, 5(4): 147-154. http://dx.doi.org/10.12677/JISP.2016.54017

参考文献

[1] Mou, X.G., Wei, G.H. and Zhou, X. (2014) Parallel Programming and Optimization Based on TMS320C6678. Applied Mechanics & Materials, 615, 259-264. http://dx.doi.org/10.4028/www.scientific.net/AMM.615.259
[2] Chavarrias, M., Pescador, F., Garrido, M.J., et al. (2015) A Multicore DSP HEVC Decoder Using an Actorbased Dataflow Model and OpenMP. IEEE Transactions on Consumer Electronics, 61, 236-244. http://dx.doi.org/10.1109/TCE.2015.7150599
[3] Zheng, Z., Chen, X., Wang, Z., et al. (2011) Performance Model for OpenMP Parallelized Loops. IEEE International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE), 16-18 December 2011, 383-387. http://dx.doi.org/10.1109/tmee.2011.6199223
[4] Texas Instrument. DataSheet Tms320c6678 Multicore Fixed and Floating-Point Digital Signal Processor. SPRS691D April 2013.
[5] Texas Instrument (2012) OpenMP Programming for Key Stone Multicore Processors.
[6] Mego, R. and Fryza, T. (2013) Performance of Parallel Algorithms Using OpenMP. Radioelektronika, 16-17 April 2013, 236-239. http://dx.doi.org/10.1109/radioelek.2013.6530923
[7] Texas Instrument (2011) OpenMP Programming for TMS320C66x Multicore DSPs.
[8] Texas Instrument (2012) TMS320C6000 Optimizing Compiler v7.4 User’s Guide. SPRU187U.
[9] Texas Instrument (2012) TMS320C6000 Assembly Language Tools v7.4 User’s Guide. SPRU186W.
[10] 张丽, 陈志强, 高文焕, 等. 均值加速的快速中值滤波算法[J]. 清华大学学报: 自然科学版, 2004, 44(9): 1157- 1159.