>> English

第4回 XcalableMPワークショップ

本ワークショップでは,XMPの仕様,そのコンパイラであるOmni XMP Compilerの実装技術,各種ベンチマークなどの性能結果などについてご紹介します.さらに,XMPのアクセラレータ拡張であるXcalableACCについての最新情報についてもご紹介します.

本ワークショップは,戦略的創造研究推進事業CREST - ポストペタスケール高性能計算に資するシステムソフトウェア技術の創出(ポストペタCRST)朴チームとの共催になります.XcalableMPとXcalableACCを利用したポストペタスケール時代に向けた計算環境についてご紹介します.





10:00 - 10:30オープニング佐藤三久(理化学研究所)
10:30 - 11:00XcalableMP 2.0の概要村井均(理化学研究所)
11:00 - 11:30XcalableMP 2.0 案の動的タスク並列機能津金佳祐(筑波大学)
11:30 - 12:00Omni XMP Coarray Fortranの実装の状況岩下英俊(理化学研究所)
12:00 - 12:30XcalableMPによるFiberミニアプリ集の実装と評価原山卓也(RIST)


13:40 - 14:20Invited Talk 1: User-Level Threads and OpenMP
[Show Abstract]

User-level threads (ULTs) are a promising threading mechanism to deal with massive on-node parallelism with lower context-switch overhead. Various programming model runtimes, such as OpenMP, MPI, XcalableMP, Charm++, OmpSs, or Legion, have already adopted or are considering ULTs for their threading interface. In this talk, I will present Argobots, a lightweight low-level threading and tasking framework, developed at Argonne National Laboratory and BOLT, our OpenMP implementation over Argobots. BOLT utilizes Argobots to overcome shortcomings of conventional OS-level threads (e.g., oversubscription) and is specialized for nested or fine-grained parallelism. Its runtime and compiler are based on the Intel OpenMP runtime and Clang/LLVM, respectively. This talk will present the design and implementation of BOLT as well as preliminary performance results.

Sangmin Seo (Argonne National Laboratory)
14:20 - 15:00Invited Talk 2: CLAW: One code to rule them all
[Show Abstract]

With the emergence of new hybrid HPC architectures deployed world-wide, scientific applications need to be adapted in order to take advantage of these new computing power. Programming standards like OpenACC have been successfully applied to allow parallelism in existing code to be efficiently offloaded on accelerators with a small footprint on the existing code. Achieving optimal performance on various architectures is not always possible while keeping a single source. Optimization indeed often requires architecture-specific code restructuring, and this comes at the cost of code maintainability. This situation has been observed while porting the COSMO weather model, as well as the CAM-SE model to hybrid CPU/GPU architecture.
In our project, we focus on column-based problems (i.e. problems which do not have any dependency in the spatial horizontal dimension), like the physical parameterizations of weather models. With these algorithms the domain scientist should not worry about its parallelization on the horizontal grid.
We propose a directive language as well as a tool named CLAW that allow the domain scientist to focus on a performance-agnostic single column-based algorithm. CLAW is then used to apply the necessary code transformations to generate performance optimal code for different target architectures employing accelerator-directive language primitives from a single Fortran source code. In addition, we offer also specific low-level transformation directives to be applied on existing code.

Valentin Clément (C2SM, ETH Zurich / MeteoSwiss)


15:20 - 15:50演算通信融合機構による計算科学アクセラレーション朴泰祐(筑波大学)
15:50 - 16:20FPGAにおける計算科学アプリケーションの部分オフローディング藤田典久(筑波大学)
16:20 - 16:50TCA機構におけるGPU対応GASNetの実装佐藤賢太(筑波大学)
16:50 - 17:20アクセラレータクラスタ向けPGAS言語XcalableACCの実装状況報告田渕晶大(筑波大学)


  • 17:20 - 17:30:佐藤三久(理化学研究所)