4th XcalableMP Workshop

This workshop introduces new specification of XMP, implementation of the omni XMP compiler, evaluation results of XMP applications, and so on. Moreover, this workshop also introduces XcalableACC which is an extension for accelerated cluster system.

This workshop is co-hosted by Prof. Boku team of JST CREST Development of System Software Technologies for post-Peta Scale High Performance Computing (Post-peta CREST). The research area of the Post-peta CREST aims at developing system software technologies for the era of Post-Petascale.

Information

Date: Monday, November 7, 2016
Venue: AKIHABARA UDX 6th floor in Tokyo [Map]
Workshop Fee: Free
Reception Fee: 3,500 yen (May be changed)
Hosted by PC Cluster Consortium
Co-hosted by Post-peta CREST

Program

Registration is opened at 9:30.

1st Session

Time	Title	Speaker
10:00 - 10:30	Opening	Mitsuhisa Sato (RIKEN AICS)
10:30 - 11:00	Overview of XcalableMP 2.0	Hitoshi Murai (RIKEN AICS)
11:00 - 11:30	Dynamic Task Parallelism in XcalableMP 2.0 Proposal	Keisuke Tsugane (University of Tsukuba)
11:30 - 12:00	State of the Implementation of Omni XMP Coarray Fortran	Hidetoshi Iwashita (RIKEN AICS)
12:00 - 12:30	Implementation and Evaluation of the Fiber miniapps using XcalableMP	Takaya Harayama (RIST)

2nd Session

Time	Title	Speaker	PDF
13:40 - 14:20	Invited Talk 1: User-Level Threads and OpenMP [Show Abstract] User-level threads (ULTs) are a promising threading mechanism to deal with massive on-node parallelism with lower context-switch overhead. Various programming model runtimes, such as OpenMP, MPI, XcalableMP, Charm++, OmpSs, or Legion, have already adopted or are considering ULTs for their threading interface. In this talk, I will present Argobots, a lightweight low-level threading and tasking framework, developed at Argonne National Laboratory and BOLT, our OpenMP implementation over Argobots. BOLT utilizes Argobots to overcome shortcomings of conventional OS-level threads (e.g., oversubscription) and is specialized for nested or fine-grained parallelism. Its runtime and compiler are based on the Intel OpenMP runtime and Clang/LLVM, respectively. This talk will present the design and implementation of BOLT as well as preliminary performance results.	Sangmin Seo (Argonne National Laboratory)
14:20 - 15:00	Invited Talk 2: CLAW: One code to rule them all [Show Abstract] With the emergence of new hybrid HPC architectures deployed world-wide, scientific applications need to be adapted in order to take advantage of these new computing power. Programming standards like OpenACC have been successfully applied to allow parallelism in existing code to be efficiently offloaded on accelerators with a small footprint on the existing code. Achieving optimal performance on various architectures is not always possible while keeping a single source. Optimization indeed often requires architecture-specific code restructuring, and this comes at the cost of code maintainability. This situation has been observed while porting the COSMO weather model, as well as the CAM-SE model to hybrid CPU/GPU architecture. In our project, we focus on column-based problems (i.e. problems which do not have any dependency in the spatial horizontal dimension), like the physical parameterizations of weather models. With these algorithms the domain scientist should not worry about its parallelization on the horizontal grid. We propose a directive language as well as a tool named CLAW that allow the domain scientist to focus on a performance-agnostic single column-based algorithm. CLAW is then used to apply the necessary code transformations to generate performance optimal code for different target architectures employing accelerator-directive language primitives from a single Fortran source code. In addition, we offer also specific low-level transformation directives to be applied on existing code.	Valentin Clément (C2SM, ETH Zurich / MeteoSwiss)

3rd Session (Post-peta CREST)

Time	Title	Speaker
15:20 - 15:50	Accelerated Computational Sciences with Unified Environment for Computation and Communication	Taisuke Boku (University of Tsukuba)
15:50 - 16:20	Partial Offloading of Computational Science Applications on FPGA	Norihisa Fujita (University of Tsukuba)
16:20 - 16:50	GASNet for GPU - An Implementation over Tightly Coupled Accelerator Communication Platform	Kenta Sato (University of Tsukuba)
16:50 - 17:20	Report on Implementation Status of A PGAS Language XcalableACC for Accelerated Clusters	Akihiro Tabuchi (University of Tsukuba)

Closing

17:20 - 17:30 : Mitsuhisa Sato (RIKEN AICS)

Reception

Place: Budouya Wine [MAP]
Time: 18:00 - 20:00