GPUs are many-core processors with tremendous computational power. However, as automatic parallelization has not been realized yet, developing high-performance parallel code for GPUs is still very challenging. The paper presents a novel translation framework designed for virtual execution environment based on CPU/GPU architecture. It addresses two major challenges of taking advantage of general purpose computation on graphics processing units (GPGPU) to improve performance: no rewriting the existing source code and resolving binary compatibility issues between different GPUs. The translation framework uses semi-automatic parallelization technology to port existing code to explicitly parallel programming models. It not only offers a mapping strategy from X86 platform to CUDA programming model, but also synchronizes the execution between the CPU and the GPUs. The input to our translation framework is parallelizable part of the program within binary code. With an additional information related to the parallelizable part, the translation framework transforms the sequential code into PTX code and execute it on GPUs. Experimental results on several programs from CUDA SDK Code Samples and Parboil Benchmark Suite show that our translation framework could achieve very high performance, even up to several tens of times speedup over the X86 native version.
Guoxing Dong, Kai Chen, Erzhou Zhu
2010 3rd International Symposium on Parallel Architectures, Algorithms and Programming