4th XcalableMP Workshop
This workshop introduces new specification of XMP,
implementation of the omni XMP compiler,
evaluation results of XMP applications, and so on.
Moreover, this workshop also introduces XcalableACC which is an extension for accelerated cluster system.
This workshop is co-hosted by Prof. Boku team of JST CREST Development of System Software Technologies for post-Peta Scale High Performance Computing (Post-peta CREST).
The research area of the Post-peta CREST aims at developing system software technologies for the era of Post-Petascale.
- Date: Monday, November 7, 2016
- Venue: AKIHABARA UDX 6th floor in Tokyo [Map]
- Workshop Fee: Free
- Reception Fee: 3,500 yen (May be changed)
- Hosted by PC Cluster Consortium
- Co-hosted by Post-peta CREST
Registration is opened at 9:30.
|10:00 - 10:30||Opening||Mitsuhisa Sato (RIKEN AICS)||
|10:30 - 11:00||Overview of XcalableMP 2.0||Hitoshi Murai (RIKEN AICS)||
|11:00 - 11:30||Dynamic Task Parallelism in XcalableMP 2.0 Proposal||Keisuke Tsugane (University of Tsukuba)||
|11:30 - 12:00||State of the Implementation of Omni XMP Coarray Fortran||Hidetoshi Iwashita (RIKEN AICS)||
|12:00 - 12:30||Implementation and Evaluation of the Fiber miniapps using XcalableMP||Takaya Harayama (RIST)||
|13:40 - 14:20||Invited Talk 1: User-Level Threads and OpenMP
User-level threads (ULTs) are a promising threading mechanism to deal with massive on-node parallelism with lower context-switch overhead. Various programming model runtimes, such as OpenMP, MPI, XcalableMP, Charm++, OmpSs, or Legion, have already adopted or are considering ULTs for their threading interface. In this talk, I will present Argobots, a lightweight low-level threading and tasking framework, developed at Argonne National Laboratory and BOLT, our OpenMP implementation over Argobots. BOLT utilizes Argobots to overcome shortcomings of conventional OS-level threads (e.g., oversubscription) and is specialized for nested or fine-grained parallelism. Its runtime and compiler are based on the Intel OpenMP runtime and Clang/LLVM, respectively. This talk will present the design and implementation of BOLT as well as preliminary performance results.
|Sangmin Seo (Argonne National Laboratory)
|14:20 - 15:00||Invited Talk 2: CLAW: One code to rule them all
With the emergence of new hybrid HPC architectures deployed world-wide, scientific applications need to be adapted in order to take advantage of these new computing power. Programming standards like OpenACC have been successfully applied to allow parallelism in existing code to be efficiently offloaded on accelerators with a small footprint on the existing code. Achieving optimal performance on various architectures is not always possible while keeping a single source. Optimization indeed often requires architecture-specific code restructuring, and this comes at the cost of code maintainability. This situation has been observed while porting the COSMO weather model, as well as the CAM-SE model to hybrid CPU/GPU architecture.
In our project, we focus on column-based problems (i.e. problems which do not have any dependency in the spatial horizontal dimension), like the physical parameterizations of weather models. With these algorithms the domain scientist should not worry about its parallelization on the horizontal grid.
We propose a directive language as well as a tool named CLAW that allow the domain scientist to focus on a performance-agnostic single column-based algorithm. CLAW is then used to apply the necessary code transformations to generate performance optimal code for different target architectures employing accelerator-directive language primitives from a single Fortran source code. In addition, we offer also specific low-level transformation directives to be applied on existing code.
|Valentin Clément (C2SM, ETH Zurich / MeteoSwiss)
3rd Session (Post-peta CREST)
|15:20 - 15:50||Accelerated Computational Sciences with Unified Environment for Computation and Communication||Taisuke Boku (University of Tsukuba)|
|15:50 - 16:20||Partial Offloading of Computational Science Applications on FPGA||Norihisa Fujita (University of Tsukuba)|
|16:20 - 16:50||GASNet for GPU - An Implementation over Tightly Coupled Accelerator Communication Platform||Kenta Sato (University of Tsukuba)|
|16:50 - 17:20||Report on Implementation Status of A PGAS Language XcalableACC for Accelerated Clusters||Akihiro Tabuchi (University of Tsukuba)|
- 17:20 - 17:30 : Mitsuhisa Sato (RIKEN AICS)