>> Japanese

4th XcalableMP Workshop

This workshop introduces new specification of XMP, implementation of the omni XMP compiler, evaluation results of XMP applications, and so on. Moreover, this workshop also introduces XcalableACC which is an extension for accelerated cluster system.

This workshop is co-hosted by Prof. Boku team of JST CREST Development of System Software Technologies for post-Peta Scale High Performance Computing (Post-peta CREST). The research area of the Post-peta CREST aims at developing system software technologies for the era of Post-Petascale.


  • Date: Monday, November 7, 2016
  • Venue: AKIHABARA UDX 6th floor in Tokyo [Map]
  • Workshop Fee: Free
  • Reception Fee: 3,500 yen (May be changed)
  • Hosted by PC Cluster Consortium
  • Co-hosted by Post-peta CREST


Registration is opened at 9:30.

1st Session

10:00 - 10:30OpeningMitsuhisa Sato (RIKEN AICS)
10:30 - 11:00Overview of XcalableMP 2.0Hitoshi Murai (RIKEN AICS)
11:00 - 11:30Dynamic Task Parallelism in XcalableMP 2.0 ProposalKeisuke Tsugane (University of Tsukuba)
11:30 - 12:00State of the Implementation of Omni XMP Coarray FortranHidetoshi Iwashita (RIKEN AICS)
12:00 - 12:30Implementation and Evaluation of the Fiber miniapps using XcalableMPTakaya Harayama (RIST)

2nd Session

13:40 - 14:20Invited Talk 1: User-Level Threads and OpenMP
[Show Abstract]

User-level threads (ULTs) are a promising threading mechanism to deal with massive on-node parallelism with lower context-switch overhead. Various programming model runtimes, such as OpenMP, MPI, XcalableMP, Charm++, OmpSs, or Legion, have already adopted or are considering ULTs for their threading interface. In this talk, I will present Argobots, a lightweight low-level threading and tasking framework, developed at Argonne National Laboratory and BOLT, our OpenMP implementation over Argobots. BOLT utilizes Argobots to overcome shortcomings of conventional OS-level threads (e.g., oversubscription) and is specialized for nested or fine-grained parallelism. Its runtime and compiler are based on the Intel OpenMP runtime and Clang/LLVM, respectively. This talk will present the design and implementation of BOLT as well as preliminary performance results.

Sangmin Seo (Argonne National Laboratory)
14:20 - 15:00Invited Talk 2: CLAW: One code to rule them all
[Show Abstract]

With the emergence of new hybrid HPC architectures deployed world-wide, scientific applications need to be adapted in order to take advantage of these new computing power. Programming standards like OpenACC have been successfully applied to allow parallelism in existing code to be efficiently offloaded on accelerators with a small footprint on the existing code. Achieving optimal performance on various architectures is not always possible while keeping a single source. Optimization indeed often requires architecture-specific code restructuring, and this comes at the cost of code maintainability. This situation has been observed while porting the COSMO weather model, as well as the CAM-SE model to hybrid CPU/GPU architecture.
In our project, we focus on column-based problems (i.e. problems which do not have any dependency in the spatial horizontal dimension), like the physical parameterizations of weather models. With these algorithms the domain scientist should not worry about its parallelization on the horizontal grid.
We propose a directive language as well as a tool named CLAW that allow the domain scientist to focus on a performance-agnostic single column-based algorithm. CLAW is then used to apply the necessary code transformations to generate performance optimal code for different target architectures employing accelerator-directive language primitives from a single Fortran source code. In addition, we offer also specific low-level transformation directives to be applied on existing code.

Valentin Clément (C2SM, ETH Zurich / MeteoSwiss)

3rd Session (Post-peta CREST)

15:20 - 15:50Accelerated Computational Sciences with Unified Environment for Computation and CommunicationTaisuke Boku (University of Tsukuba)
15:50 - 16:20Partial Offloading of Computational Science Applications on FPGANorihisa Fujita (University of Tsukuba)
16:20 - 16:50GASNet for GPU - An Implementation over Tightly Coupled Accelerator Communication PlatformKenta Sato (University of Tsukuba)
16:50 - 17:20Report on Implementation Status of A PGAS Language XcalableACC for Accelerated ClustersAkihiro Tabuchi (University of Tsukuba)


  • 17:20 - 17:30 : Mitsuhisa Sato (RIKEN AICS)