Accommodating workload diversity in chip multiprocessors

Chip multiprocessors (CMPs) hold the prospect of delivering long-term performance scalability while dramatically reducing design complexity compared to monolithic wide-issue processors.

In such a case, programmers can parallelize only the easier-to-parallelize portions of the application and rely on the hardware to run the serial portion faster.

We make a case for an architecture which contains one high performance out-of-order processor and multiple low performance in-order processors.

(eds) Algorithms and Architectures for Parallel Processing. For the abundance of computing resources, a fundamental problem is how to map application on it, or how many cores should be assigned for each application.

Springer, Berlin, Heidelberg Technology evolving has forced the coming of chip multiprocessors (CMP) era, and enabled architects to place an increasing number of cores on single chip.

However, performance on such a processor is low unless the workload is nearly completely parallelized, which depending on the workload can be impossible or require significant programmer effort.

This paper argues that the programmer effort required to parallelize an application can be reduced if the underlying architecture promises faster execution of the serial portion of an application.

This approach to polymorphism provides better performance across a wide range of application types than an approach in which many small processors are aggregated to run workloads with irregular parallelism.

Our results show that high performance can be obtained in each of the three modes--ILP, TLP, and DLP-demonstrating the viability of the polymorphous coarse-grained approach for future microprocessors.

Chip Multiprocessors are becoming common as the cost of increasing chip power begins to limit single core performance.