XcalableMP PGAS Programming Language : From Programming Model to Applications.
Main Author: | |
---|---|
Format: | eBook |
Language: | English |
Published: |
Singapore :
Springer Singapore Pte. Limited,
2020.
|
Edition: | 1st ed. |
Subjects: | |
Online Access: | Click to View |
Table of Contents:
- Intro
- Preface
- Contents
- XcalableMP Programming Model and Language
- 1 Introduction
- 1.1 Target Hardware
- 1.2 Execution Model
- 1.3 Data Model
- 1.4 Programming Models
- 1.4.1 Partitioned Global Address Space
- 1.4.2 Global-View Programming Model
- 1.4.3 Local-View Programming Model
- 1.4.4 Mixture of Global View and Local View
- 1.5 Base Languages
- 1.5.1 Array Section in XcalableMP C
- 1.5.2 Array Assignment Statement in XcalableMP C
- 1.6 Interoperability
- 2 Data Mapping
- 2.1 nodes Directive
- 2.2 template Directive
- 2.3 distribute Directive
- 2.3.1 Block Distribution
- 2.3.2 Cyclic Distribution
- 2.3.3 Block-Cyclic Distribution
- 2.3.4 Gblock Distribution
- 2.3.5 Distribution of Multi-Dimensional Templates
- 2.4 align Directive
- 2.5 Dynamic Allocation of Distributed Array
- 2.6 template_fix Construct
- 3 Work Mapping
- 3.1 task and tasks Construct
- 3.1.1 task Construct
- 3.1.2 tasks Construct
- 3.2 loop Construct
- 3.2.1 Reduction Computation
- 3.2.2 Parallelizing Nested Loop
- 3.3 array Construct
- 4 Data Communication
- 4.1 shadow Directive and reflect Construct
- 4.1.1 Declaring Shadow
- 4.1.2 Updating Shadow
- 4.2 gmove Construct
- 4.2.1 Collective Mode
- 4.2.2 In Mode
- 4.2.3 Out Mode
- 4.3 barrier Construct
- 4.4 reduction Construct
- 4.5 bcast Construct
- 4.6 wait_async Construct
- 4.7 reduce_shadow Construct
- 5 Local-View Programming
- 5.1 Introduction
- 5.2 Coarray Declaration
- 5.3 Put Communication
- 5.4 Get Communication
- 5.5 Synchronization
- 5.5.1 Sync All
- 5.5.2 Sync Images
- 5.5.3 Sync Memory
- 6 Procedure Interface
- 7 XMPT Tool Interface
- 7.1 Overview
- 7.2 Specification
- 7.2.1 Initialization
- 7.2.2 Events
- References
- Implementation and Performance Evaluation of Omni Compiler
- 1 Overview
- 2 Implementation
- 2.1 Operation Flow.
- 2.2 Example of Code Translation
- 2.2.1 Distributed Array
- 2.2.2 Loop Statement
- 2.2.3 Communication
- 3 Installation
- 3.1 Overview
- 3.2 Get Source Code
- 3.2.1 From GitHub
- 3.2.2 From Our Website
- 3.3 Software Dependency
- 3.4 General Installation
- 3.4.1 Build and Install
- 3.4.2 Set PATH
- 3.5 Optional Installation
- 3.5.1 OpenACC
- 3.5.2 XcalableACC
- 3.5.3 One-Sided Library
- 4 Creation of Execution Binary
- 4.1 Compile
- 4.2 Execution
- 4.2.1 XcalableMP and XcalableACC
- 4.2.2 OpenACC
- 4.3 Cooperation with Profiler
- 4.3.1 Scalasca
- 4.3.2 tlog
- 5 Performance Evaluation
- 5.1 Experimental Environment
- 5.2 EP STREAM Triad
- 5.2.1 Design
- 5.2.2 Implementation
- 5.2.3 Evaluation
- 5.3 High-Performance Linpack
- 5.3.1 Design
- 5.3.2 Implementation
- 5.3.3 Evaluation
- 5.4 Global Fast Fourier Transform
- 5.4.1 Design
- 5.4.2 Implementation
- 5.4.3 Evaluation
- 5.5 RandomAccess
- 5.5.1 Design
- 5.5.2 Implementation
- 5.5.3 Evaluation
- 5.6 Discussion
- 6 Conclusion
- References
- Coarrays in the Context of XcalableMP
- 1 Introduction
- 2 Requirements from Language Specifications
- 2.1 Images Mapped to XMP Nodes
- 2.2 Allocation of Coarrays
- 2.3 Communication
- 2.4 Synchronization
- 2.5 Subarrays and Data Contiguity
- 2.6 Coarray C Language Specifications
- 3 Implementation
- 3.1 Omni XMP Compiler Framework
- 3.2 Allocation and Registration
- 3.2.1 Three Methods of Memory Management
- 3.2.2 Initial Allocation for Static Coarrays
- 3.2.3 Runtime Allocation for Allocatable Coarrays
- 3.3 PUT/GET Communication
- 3.3.1 Determining the Possibility of DMA
- 3.3.2 Buffering Communication Methods
- 3.3.3 Non-blocking PUT Communication
- 3.3.4 Optimization of GET Communication
- 3.4 Runtime Libraries
- 3.4.1 Fortran Wrapper
- 3.4.2 Upper-layer Runtime (ULR) Library.
- 3.4.3 Lower-layer Runtime (LLR) Library
- 3.4.4 Communication Libraries
- 4 Evaluation
- 4.1 Fundamental Performance
- 4.2 Non-blocking Communication
- 4.3 Application Program
- 4.3.1 Coarray Version of the Himeno Benchmark
- 4.3.2 Measurement Result
- 4.3.3 Productivity
- 5 Related Work
- 6 Conclusion
- References
- XcalableACC: An Integration of XcalableMP and OpenACC
- 1 Introduction
- 1.1 Hardware Model
- 1.2 Programming Model
- 1.2.1 XMP Extensions
- 1.2.2 OpenACC Extensions
- 1.3 Execution Model
- 1.4 Data Model
- 2 XcalableACC Language
- 2.1 Data Mapping
- Example
- 2.2 Work Mapping
- Restriction
- Example 1
- Example 2
- 2.3 Data Communication and Synchronization
- Example
- 2.4 Coarrays
- Restriction
- Example
- 2.5 Handling Multiple Accelerators
- 2.5.1 devices Directive
- Example
- 2.5.2 on_device Clause
- 2.5.3 layout Clause
- Example
- 2.5.4 shadow Clause
- Example
- 2.5.5 barrier_device Construct
- Example
- 3 Omni XcalableACC Compiler
- 4 Performance of Lattice QCD Application
- 4.1 Overview of Lattice QCD
- 4.2 Implementation
- 5 Performance Evaluation
- 5.1 Result
- 5.2 Discussion
- 6 Productivity Improvement
- 6.1 Requirement for Productive Parallel Language
- 6.2 Quantitative Evaluation by Delta Source Lines of Codes
- 6.3 Discussion
- References
- Mixed-Language Programming with XcalableMP
- 1 Background
- 2 Translation by Omni Compiler
- 3 Functions for Mixed-Language
- 3.1 Function to Call MPI Program from XMP Program
- 3.2 Function to Call XMP Program from MPI Program
- 3.3 Function to Call XMP Program from Python Program
- 3.3.1 From Parallel Python Program
- 3.3.2 From Sequential Python Program
- 4 Application to Order/Degree Problem
- 4.1 What Is Order/Degree Program
- 4.2 Implementation
- 4.3 Evaluation
- 5 Conclusion
- References.
- Three-Dimensional Fluid Code with XcalableMP
- 1 Introduction
- 2 Global-View Programming Model
- 2.1 Domain Decomposition Methods
- 2.2 Performance on the K Computer
- 2.2.1 Comparison with Hand-Coded MPI Program
- 2.2.2 Optimization for SIMD
- 2.2.3 Optimization for Allocatable Arrays
- 3 Local-View Programming Model
- 3.1 Communications Using Coarray
- 3.2 Performance on the K Computer
- 4 Summary
- References
- Hybrid-View Programming of Nuclear Fusion Simulation Code in XcalableMP
- 1 Introduction
- 2 Nuclear Fusion Simulation Code
- 2.1 Gyrokinetic PIC Simulation
- 2.2 GTC
- 3 Implementation of GTC-P by Hybrid-view Programming
- 3.1 Hybrid-View Programming Model
- 3.2 Implementation Based on the XMP-Localview Model: XMP-localview
- 3.3 Implementation Based on the XMP-Hybridview Model: XMP-Hybridview
- 4 Performance Evaluation
- 4.1 Experimental Setting
- 4.2 Results
- 4.3 Productivity and Performance
- 5 Related Research
- 6 Conclusion
- References
- Parallelization of Atomic Image Reconstruction from X-ray Fluorescence Holograms with XcalableMP
- 1 Introduction
- 2 X-ray Fluorescence Holography
- 2.1 Reconstruction of Atomic Images
- 2.2 Analysis Procedure of XFH
- 3 Parallelization
- 3.1 Parallelization of Reconstruction of Two-Dimensional Atomic Images by OpenMP
- 3.2 Parallelization of Reconstruction of Three-dimensional Atomic Images by XcalableMP
- 4 Performance Evaluation
- 4.1 Performance Results of Reconstruction of Two-Dimensional Atomic Images
- 4.2 Performance Results of Reconstruction of Three-dimensional Atomic Images
- 4.3 Comparison of Parallelization with MPI
- 5 Conclusion
- References
- Multi-SPMD Programming Model with YML and XcalableMP
- 1 Introduction
- 2 Background: International Collaborations for the Post-Petascale and Exascale Computing
- 3 Multi-SPMD Programming Model.
- 3.1 Overview
- 3.2 YML
- 3.3 OmniRPC-MPI
- 4 Application Development in the mSPMD Programming Environment
- 4.1 Task Generator
- 4.2 Workflow Development
- 4.3 Workflow Execution
- 5 Experiments
- 6 Eigen Solver on the mSPMD Programming Model
- 6.1 Implicitly Restarted Arnoldi Method (IRAM), Multiple Implicitly Restarted Arnoldi Method (MIRAM) and Their Implementations for the mSPMD Programming Model
- 6.2 Experiments
- 7 Fault-Tolerance Features in the mSPMD Programming Model
- 7.1 Overview and Implementation
- 7.2 Experiments
- 8 Runtime Correctness Check for the mSPMD Programming Model
- 8.1 Overview and Implementation
- 8.2 Experiments
- 9 Summary
- References
- XcalableMP 2.0 and Future Directions
- 1 Introduction
- 2 XcalableMP on Fugaku
- 2.1 Performance of XcalableMP Global View Programming
- 2.2 Performance of XcalableMP Local View Programming
- 3 Global Task Parallel Programming
- 3.1 OpenMP and XMP Tasklet Directive
- 3.2 A Proposal for Global Task Parallel Programming
- 3.3 Prototype Design of Code Transformation
- 3.4 Preliminary Performance
- 3.5 Communication Optimization for Manycore Clusters
- 4 Retrospectives and Challenges for Future PGAS Models
- 4.1 Low-Level Communication Layer for PGAS Model
- 4.2 XcalableMP as a DSL for Stencil Applications
- 4.3 XcalableMP API: Compiler-Free Approach
- 4.4 Global Task Parallel Programming Model for Accelerators
- References.