第4回 XcalableMPワークショップ

本ワークショップでは，XMPの仕様，そのコンパイラであるOmni XMP Compilerの実装技術，各種ベンチマークなどの性能結果などについてご紹介します．さらに，XMPのアクセラレータ拡張であるXcalableACCについての最新情報についてもご紹介します．

本ワークショップは，戦略的創造研究推進事業CREST - ポストペタスケール高性能計算に資するシステムソフトウェア技術の創出（ポストペタCRST）の朴チームとの共催になります．XcalableMPとXcalableACCを利用したポストペタスケール時代に向けた計算環境についてご紹介します．

イベント情報

日程：2016年11月7日（月）
会場：秋葉原UDX 6F（定員60名）
参加費：無料
懇親会費（参加される方のみ）：3,500円（予定）
主催：PCクラスタコンソーシアム
共催：JST ポストペタCREST

プログラム

受付は9:30から開始します．

第1セッション

Time	Title	Speaker
10:00 - 10:30	オープニング	佐藤三久（理化学研究所）
10:30 - 11:00	XcalableMP 2.0の概要	村井均（理化学研究所）
11:00 - 11:30	XcalableMP 2.0 案の動的タスク並列機能	津金佳祐（筑波大学）
11:30 - 12:00	Omni XMP Coarray Fortranの実装の状況	岩下英俊（理化学研究所）
12:00 - 12:30	XcalableMPによるFiberミニアプリ集の実装と評価	原山卓也（RIST）

第2セッション

Time	Title	Speaker	PDF
13:40 - 14:20	Invited Talk 1: User-Level Threads and OpenMP [Show Abstract] User-level threads (ULTs) are a promising threading mechanism to deal with massive on-node parallelism with lower context-switch overhead. Various programming model runtimes, such as OpenMP, MPI, XcalableMP, Charm++, OmpSs, or Legion, have already adopted or are considering ULTs for their threading interface. In this talk, I will present Argobots, a lightweight low-level threading and tasking framework, developed at Argonne National Laboratory and BOLT, our OpenMP implementation over Argobots. BOLT utilizes Argobots to overcome shortcomings of conventional OS-level threads (e.g., oversubscription) and is specialized for nested or fine-grained parallelism. Its runtime and compiler are based on the Intel OpenMP runtime and Clang/LLVM, respectively. This talk will present the design and implementation of BOLT as well as preliminary performance results.	Sangmin Seo (Argonne National Laboratory)
14:20 - 15:00	Invited Talk 2: CLAW: One code to rule them all [Show Abstract] With the emergence of new hybrid HPC architectures deployed world-wide, scientific applications need to be adapted in order to take advantage of these new computing power. Programming standards like OpenACC have been successfully applied to allow parallelism in existing code to be efficiently offloaded on accelerators with a small footprint on the existing code. Achieving optimal performance on various architectures is not always possible while keeping a single source. Optimization indeed often requires architecture-specific code restructuring, and this comes at the cost of code maintainability. This situation has been observed while porting the COSMO weather model, as well as the CAM-SE model to hybrid CPU/GPU architecture. In our project, we focus on column-based problems (i.e. problems which do not have any dependency in the spatial horizontal dimension), like the physical parameterizations of weather models. With these algorithms the domain scientist should not worry about its parallelization on the horizontal grid. We propose a directive language as well as a tool named CLAW that allow the domain scientist to focus on a performance-agnostic single column-based algorithm. CLAW is then used to apply the necessary code transformations to generate performance optimal code for different target architectures employing accelerator-directive language primitives from a single Fortran source code. In addition, we offer also specific low-level transformation directives to be applied on existing code.	Valentin Clément (C2SM, ETH Zurich / MeteoSwiss)

第3セッション（ポストペタCREST）

Time	Title	Speaker
15:20 - 15:50	演算通信融合機構による計算科学アクセラレーション	朴泰祐（筑波大学）
15:50 - 16:20	FPGAにおける計算科学アプリケーションの部分オフローディング	藤田典久（筑波大学）
16:20 - 16:50	TCA機構におけるGPU対応GASNetの実装	佐藤賢太（筑波大学）
16:50 - 17:20	アクセラレータクラスタ向けPGAS言語XcalableACCの実装状況報告	田渕晶大（筑波大学）

クロージング

17:20 - 17:30：佐藤三久（理化学研究所）

懇親会

場所：葡萄屋 [MAP]
時間：18:00〜20:00を予定