Fusing GPU kernels within a novel single-source C++ API

Ralph Potter, Paul Keir, Jan Lucas, Mauricio Alvarez-Mesa, Ben Juurlink, Andrew Richards

Research output: Contribution to conferencePresentation

Abstract

The prospect of GPU kernel fusion is often described in research papers as a standalone command-line tool. Such a tool adopts a usage pattern wherein a user isolates, or annotates, an ordered set of kernels. Given such OpenCL C kernels as input, the tool would output a single kernel, which performs similar calculations, hence minimizing costly runtime intermediate load and store operations. Such a mode of operation is, however, a departure from normality for many developers, and is mainly of academic interest.

Automatic compiler-based kernel fusion could provide a vast improvement to the end-user's development experience. The OpenCL Host API, however, does not provide a means to specify opportunities for kernel fusion to the compiler. Ongoing and rapidly maturing compiler and API research by Codeplay aims to provide a higher-level, single-source, industry-focused C++-based interface to OpenCL. Opportunities for kernel fusion have now also been investigated here; utilizing features from C++11 including lambda functions; variadic templates; and lazy evaluation using std::bind expressions.

While pixel-to-pixel transformations are interesting in this context, insomuch as they demonstrate the expressivity of this new single-source C++ API, we also consider fusing transformations which utilize synchronization within workgroups. Hence convolutions, utilizing halos; and the use of the GPU's local shared memory are also explored.

A perennial problem has therefore been restructured to accommodate a modern C++-based expression of kernel fusion. Kernel fusion thus becomes an integrated component of an extended C++ compiler and API.
Original languageEnglish
Publication statusPublished - 18 Nov 2013
Externally publishedYes
EventIntel Compiler, Architecture and Tools Conference 2013 - Intel Office, Haifa, Israel
Duration: 18 Nov 201319 Nov 2013
https://software.intel.com/en-us/event/compilerconf/2013/sessions

Conference

ConferenceIntel Compiler, Architecture and Tools Conference 2013
Abbreviated titleCATC 2013
CountryIsrael
CityHaifa
Period18/11/1319/11/13
Internet address

Fingerprint

Application programming interfaces (API)
Fusion reactions
Pixels
Convolution
Graphics processing unit
Synchronization
Data storage equipment
Industry

Cite this

Potter, R., Keir, P., Lucas, J., Alvarez-Mesa, M., Juurlink, B., & Richards, A. (2013). Fusing GPU kernels within a novel single-source C++ API. Intel Compiler, Architecture and Tools Conference 2013, Haifa, Israel.
Potter, Ralph ; Keir, Paul ; Lucas, Jan ; Alvarez-Mesa, Mauricio ; Juurlink, Ben ; Richards, Andrew. / Fusing GPU kernels within a novel single-source C++ API. Intel Compiler, Architecture and Tools Conference 2013, Haifa, Israel.
@conference{8bc14df6b75b4482b1a75c3ae37dbb44,
title = "Fusing GPU kernels within a novel single-source C++ API",
abstract = "The prospect of GPU kernel fusion is often described in research papers as a standalone command-line tool. Such a tool adopts a usage pattern wherein a user isolates, or annotates, an ordered set of kernels. Given such OpenCL C kernels as input, the tool would output a single kernel, which performs similar calculations, hence minimizing costly runtime intermediate load and store operations. Such a mode of operation is, however, a departure from normality for many developers, and is mainly of academic interest.Automatic compiler-based kernel fusion could provide a vast improvement to the end-user's development experience. The OpenCL Host API, however, does not provide a means to specify opportunities for kernel fusion to the compiler. Ongoing and rapidly maturing compiler and API research by Codeplay aims to provide a higher-level, single-source, industry-focused C++-based interface to OpenCL. Opportunities for kernel fusion have now also been investigated here; utilizing features from C++11 including lambda functions; variadic templates; and lazy evaluation using std::bind expressions.While pixel-to-pixel transformations are interesting in this context, insomuch as they demonstrate the expressivity of this new single-source C++ API, we also consider fusing transformations which utilize synchronization within workgroups. Hence convolutions, utilizing halos; and the use of the GPU's local shared memory are also explored.A perennial problem has therefore been restructured to accommodate a modern C++-based expression of kernel fusion. Kernel fusion thus becomes an integrated component of an extended C++ compiler and API.",
author = "Ralph Potter and Paul Keir and Jan Lucas and Mauricio Alvarez-Mesa and Ben Juurlink and Andrew Richards",
year = "2013",
month = "11",
day = "18",
language = "English",
note = "Intel Compiler, Architecture and Tools Conference 2013, CATC 2013 ; Conference date: 18-11-2013 Through 19-11-2013",
url = "https://software.intel.com/en-us/event/compilerconf/2013/sessions",

}

Potter, R, Keir, P, Lucas, J, Alvarez-Mesa, M, Juurlink, B & Richards, A 2013, 'Fusing GPU kernels within a novel single-source C++ API' Intel Compiler, Architecture and Tools Conference 2013, Haifa, Israel, 18/11/13 - 19/11/13, .

Fusing GPU kernels within a novel single-source C++ API. / Potter, Ralph; Keir, Paul; Lucas, Jan; Alvarez-Mesa, Mauricio; Juurlink, Ben; Richards, Andrew.

2013. Intel Compiler, Architecture and Tools Conference 2013, Haifa, Israel.

Research output: Contribution to conferencePresentation

TY - CONF

T1 - Fusing GPU kernels within a novel single-source C++ API

AU - Potter, Ralph

AU - Keir, Paul

AU - Lucas, Jan

AU - Alvarez-Mesa, Mauricio

AU - Juurlink, Ben

AU - Richards, Andrew

PY - 2013/11/18

Y1 - 2013/11/18

N2 - The prospect of GPU kernel fusion is often described in research papers as a standalone command-line tool. Such a tool adopts a usage pattern wherein a user isolates, or annotates, an ordered set of kernels. Given such OpenCL C kernels as input, the tool would output a single kernel, which performs similar calculations, hence minimizing costly runtime intermediate load and store operations. Such a mode of operation is, however, a departure from normality for many developers, and is mainly of academic interest.Automatic compiler-based kernel fusion could provide a vast improvement to the end-user's development experience. The OpenCL Host API, however, does not provide a means to specify opportunities for kernel fusion to the compiler. Ongoing and rapidly maturing compiler and API research by Codeplay aims to provide a higher-level, single-source, industry-focused C++-based interface to OpenCL. Opportunities for kernel fusion have now also been investigated here; utilizing features from C++11 including lambda functions; variadic templates; and lazy evaluation using std::bind expressions.While pixel-to-pixel transformations are interesting in this context, insomuch as they demonstrate the expressivity of this new single-source C++ API, we also consider fusing transformations which utilize synchronization within workgroups. Hence convolutions, utilizing halos; and the use of the GPU's local shared memory are also explored.A perennial problem has therefore been restructured to accommodate a modern C++-based expression of kernel fusion. Kernel fusion thus becomes an integrated component of an extended C++ compiler and API.

AB - The prospect of GPU kernel fusion is often described in research papers as a standalone command-line tool. Such a tool adopts a usage pattern wherein a user isolates, or annotates, an ordered set of kernels. Given such OpenCL C kernels as input, the tool would output a single kernel, which performs similar calculations, hence minimizing costly runtime intermediate load and store operations. Such a mode of operation is, however, a departure from normality for many developers, and is mainly of academic interest.Automatic compiler-based kernel fusion could provide a vast improvement to the end-user's development experience. The OpenCL Host API, however, does not provide a means to specify opportunities for kernel fusion to the compiler. Ongoing and rapidly maturing compiler and API research by Codeplay aims to provide a higher-level, single-source, industry-focused C++-based interface to OpenCL. Opportunities for kernel fusion have now also been investigated here; utilizing features from C++11 including lambda functions; variadic templates; and lazy evaluation using std::bind expressions.While pixel-to-pixel transformations are interesting in this context, insomuch as they demonstrate the expressivity of this new single-source C++ API, we also consider fusing transformations which utilize synchronization within workgroups. Hence convolutions, utilizing halos; and the use of the GPU's local shared memory are also explored.A perennial problem has therefore been restructured to accommodate a modern C++-based expression of kernel fusion. Kernel fusion thus becomes an integrated component of an extended C++ compiler and API.

UR - https://software.intel.com/en-us/articles/compiler-architecture-and-tools-conference-2013-abstract#fusing

UR - https://software.intel.com/en-us/event/compilerconf/2013/sessions

M3 - Presentation

ER -

Potter R, Keir P, Lucas J, Alvarez-Mesa M, Juurlink B, Richards A. Fusing GPU kernels within a novel single-source C++ API. 2013. Intel Compiler, Architecture and Tools Conference 2013, Haifa, Israel.