Abstract
The prospect of GPU kernel fusion is often described in research papers as a standalone command-line tool. Such a tool adopts a usage pattern wherein a user isolates, or annotates, an ordered set of kernels. Given such OpenCL C kernels as input, the tool would output a single kernel, which performs similar calculations, hence minimizing costly runtime intermediate load and store operations. Such a mode of operation is, however, a departure from normality for many developers, and is mainly of academic interest.
Automatic compiler-based kernel fusion could provide a vast improvement to the end-user's development experience. The OpenCL Host API, however, does not provide a means to specify opportunities for kernel fusion to the compiler. Ongoing and rapidly maturing compiler and API research by Codeplay aims to provide a higher-level, single-source, industry-focused C++-based interface to OpenCL. Opportunities for kernel fusion have now also been investigated here; utilizing features from C++11 including lambda functions; variadic templates; and lazy evaluation using std::bind expressions.
While pixel-to-pixel transformations are interesting in this context, insomuch as they demonstrate the expressivity of this new single-source C++ API, we also consider fusing transformations which utilize synchronization within workgroups. Hence convolutions, utilizing halos; and the use of the GPU's local shared memory are also explored.
A perennial problem has therefore been restructured to accommodate a modern C++-based expression of kernel fusion. Kernel fusion thus becomes an integrated component of an extended C++ compiler and API.
Automatic compiler-based kernel fusion could provide a vast improvement to the end-user's development experience. The OpenCL Host API, however, does not provide a means to specify opportunities for kernel fusion to the compiler. Ongoing and rapidly maturing compiler and API research by Codeplay aims to provide a higher-level, single-source, industry-focused C++-based interface to OpenCL. Opportunities for kernel fusion have now also been investigated here; utilizing features from C++11 including lambda functions; variadic templates; and lazy evaluation using std::bind expressions.
While pixel-to-pixel transformations are interesting in this context, insomuch as they demonstrate the expressivity of this new single-source C++ API, we also consider fusing transformations which utilize synchronization within workgroups. Hence convolutions, utilizing halos; and the use of the GPU's local shared memory are also explored.
A perennial problem has therefore been restructured to accommodate a modern C++-based expression of kernel fusion. Kernel fusion thus becomes an integrated component of an extended C++ compiler and API.
Original language | English |
---|---|
Publication status | Published - 18 Nov 2013 |
Externally published | Yes |
Event | Intel Compiler, Architecture and Tools Conference 2013 - Intel Office, Haifa, Israel Duration: 18 Nov 2013 → 19 Nov 2013 https://software.intel.com/en-us/event/compilerconf/2013/sessions |
Conference
Conference | Intel Compiler, Architecture and Tools Conference 2013 |
---|---|
Abbreviated title | CATC 2013 |
Country/Territory | Israel |
City | Haifa |
Period | 18/11/13 → 19/11/13 |
Internet address |