As technology scales below 32 nm, manufacturers began to integrate both CPU and GPU cores in a single chip, i.e., single-chip heterogeneous processor (SCHP), to improve the throughput of emerging applications. In SCHPs, the CPU and the GPU share the total chip power budget while satisfying their own power constraints, respectively. Consequently, to maximize the overall throughput and/or power efficiency, both power budget and workload should be judiciously allocated to the CPU and the GPU. In this paper, we first demonstrate that optimal allocation of power budget and workload to the CPU and the GPU can provide 13 percent higher throughput than the optimal allocation of workload alone for a single-program workload scenario. Second, we also demonstrate that asymmetric power allocation considering per-program characteristics for a multi-programmed workload scenario can provide 9 percent higher throughput or 24 percent higher power efficiency than the even power allocation per program depending on the optimization objective. Last, we propose effective runtime algorithms that can determine near-optimal or optimal combinations of workload and power budget partitioning for both single- and multi-programmed workload scenarios; the runtime algorithms can achieve 96 and 99 percent of the maximum achievable throughput within 5-8 and 3-5 kernel invocations for single- and multi-programmed workload cases, respectively.
Jaeyoung Jang, Hao Wang, Euijin Kwon
IEEE Transactions on Parallel and Distributed Systems