We describe compiler and run-time optimisations for effective auto-parallelisation of C++ programs on the Cell BE architecture. Auto-parallelisation is made easier by annotating sieve scopes, which abstract the "read in, compute in parallel, write out" processing paradigm. We show that the semantics of sieve scopes enables data movement optimisations, such as re-organising global memory reads to minimise DMA transfers and streaming reads from uniformly accessed arrays. We also describe run-time optimisations for committing side-effects to main memory. We provide experimental results showing the benefits of our optimisations, and compare the Sieve-Cell system with IBM’s OpenMP implementation for Cell.
|Title of host publication||Euro-Par 2008|
|Subtitle of host publication||Euro-Par 2008 Workshops - Parallel Processing|
|Editors||Eduardo César, Michael Alexander, Achim Streit, Jesper Larsson Träff, Christophe Cérin, Andreas Knüpfer, Dieter Kranzlmüller, Shantenu Jha|
|Number of pages||11|
|Publication status||Published - 2009|
|Name||Lecture Notes in Computer Science|