Mobile version of Hanspub
Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Fast and Fair: Data-Stream Quality of Service

Yeh T.Y. and Reinman G.

CMPNUCAPDASQOSadaptivebandwidthcachechip multiprocessorclusterdata-stream

Chip multiprocessors have the potential to exploit thread level parallelism, particularly in the context of embedded server farms where the available number of threads can be quite high. Unfortunately, both per-core and overall throughput are significantly impacted by the organization of the lowest level on-chip cache. On-chip caches for CMPs must be able to handle the increased demand and contention of multiple cores. To complicate the problem, cache demand changes dynamically with phases changes, context switches, power saving features, and assignments to asymmetric cores.We propose PDAS, a distributed NUCA L2 cache design with an adaptive sharing mechanism. Each core independently measures its dynamic need, and all cache resources are managed to increase utilization, reduce migrations, and lower interference. Per-core performance degradation is bounded while overall throughput is optimized, thus qualitatively improving performance of embedded systems where quality-of-service is an important characteristic.In single thread mode, PDAS, on average, improves by 26%, 27%, and 13% over Private, Shared, and NUCA caches respectively. This improvement is achieved while reducing internal migrations on average by 82% as compared to the NUCA. With thread contention, PDAS increases its performance and power advantage over prior work. The average migration reduction over NUCA increases to over 90%, and average IPC improvements over NUCA are 30%, 14%, and 35% for 2T, 3T, and 4T scenarios.