Pattern-Based Programming Abstractions for Heterogeneous Parallel Computing.
Main Author: | |
---|---|
Format: | eBook |
Language: | English |
Published: |
Linköping :
Linkopings Universitet,
2021.
|
Edition: | 1st ed. |
Series: | Linköping Studies in Science and Technology. Dissertations Series
|
Subjects: | |
Online Access: | Click to View |
Table of Contents:
- Intro
- Populärvetenskaplig sammanfattning
- Abstract
- Acknowledgments
- Contents
- Introduction
- Aims and research questions
- Published work behind this thesis
- Other work behind this thesis
- Structure
- Background and related work
- Motivation
- High-level parallel programming
- Skeleton programming
- Related work
- GrPPI
- Musket
- Kokkos
- SYCL
- MLIR
- StarPU
- C++ AMP, and other industry efforts
- Other related frameworks, libraries, and toolchains
- Independent surveys
- Earlier related work on SkePU
- SkePU overview
- Basic constructs
- Backend architecture
- History
- SkePU 2 design principles
- SkePU 3 design principles
- Skeleton set
- Skeleton set
- Map skeleton
- Freely accessible containers inside user functions
- Variadic type signatures
- Multi-valued return
- Index-dependent computations
- MapPairs skeleton
- MapOverlap skeleton
- Edge handling modes
- Update modes
- Reduce skeleton
- One-dimensional reductions
- Two-dimensional reductions
- Scan skeleton
- MapReduce skeleton
- MapPairsReduce skeleton
- Call skeleton
- User functions
- User functions as lambda expressions
- User types
- User constants
- Strided skeletons
- Strides Map, MapPairs, and their reduce variants
- Strides in MapOverlap
- Data representation with smart data-containers
- Smart data-containers
- Container indexing
- Container proxies
- MatRow proxy
- MatCol proxy
- Region proxy
- Memory consistency model
- External scope
- Standard library
- Deterministic random number generation
- Complex numbers
- Linear algebra
- Image filtering and visualization
- Benchmark utilities
- High-level consistent input and output
- General utilities
- Implementation
- Implementation overview
- Language embedding and type safety
- Improved type safety from SkePU 1
- Source-to-source compiler.
- Backends
- Sequential CPU backend
- Multi-core CPU backend: OpenMP
- GPU backends: OpenCL and CUDA
- C and Fortran language bindings
- Continuous integration and testing
- Dependencies
- Availability
- Hybrid CPU-GPU skeleton execution
- Introduction
- Workload partitioning and implementation
- StarPU backend implementation
- Auto-tuning
- Skeleton programming on large-scale cluster systems
- Background
- StarPU-MPI backend
- GPI backend
- GASPI and GPI
- Implementation
- Design
- Synchonization and state tracking
- Consistency model and double buffering
- Communication pattern
- Data representation
- Data transfers and caching
- Conclusions
- Extending smart data-containers for data locality awareness
- Introduction
- Large-scale data processing with MapReduce and Spark
- MapReduce
- Spark
- Lazily evaluated skeletons with tiling
- Basic approach and benefits
- Backend selection
- Loop optimization
- Evaluation points
- Further application areas
- Implementation
- Lazy tiling for stencil computations
- Applications and comparison to kernel fusion
- Polynomial evaluation using Horner's method
- Exponentiation by repeated squaring
- Heat propagation
- Related work
- High-level skeleton fusion
- Comparison to lineages
- Kernel fusion
- Types of fusions
- Example: N-body simulation
- Future work
- Multi-variant user functions
- Introduction
- Idea and implementation
- Use cases
- Vectorization example
- Generalized multi-variant components with the Call skeleton
- Other use cases
- Related work
- A deterministic portable parallel pseudo-random number generator
- Introduction
- Determinism in heterogeneous parallel computing
- Parallel pseudo-random number generation
- Previous manual parallelization of PRNG in SkePU programs
- Monte Carlo pi calculation-index-based scrambling.
- Markov Chain Monte Carlo methods in LQCD-PRNG with explicit state
- Designing a deterministic PRNG for SkePU
- Global synchronization
- Stream splitting
- State forwarding
- Optimizing long or iterated skeleton chains by pre-forwarding
- API extension design
- Related work
- Towards a modernized auto-tuner
- Background
- SkePU variadic tuner design
- Implementation
- Multi-dimensional argument sequences
- Sampler
- Execution plan and persistence
- Future work
- Evaluation results
- SkePU usability evaluation
- SkePU 2 prototype survey
- SkePU 3 survey
- Initial SkePU 2 performance evaluation
- Performance evaluation of lineages
- Sequences of Maps
- Heat propagation
- Hybrid backend
- Single skeleton evaluation
- Generic application evaluation
- Comparison to dynamic hybrid scheduling using StarPU
- Evaluation of multi-variant user functions
- Vectorization
- Median filtering
- Application benchmarks of SkePU 3
- Libsolve ODE solver
- N-body
- Blackscholes and Streamcluster
- Brain simulation
- CO2 capture
- Supercapacitor simulation
- Conjugate gradient
- Experimental evaluation of deterministic PRNG
- Monte-Carlo Pi approximation
- LQCD Mini-Application
- Miller-Rabin primality testing
- Natural noise generation
- Programmability evaluation
- SkePU-GPI cluster backend
- Microbenchmarks of SkePU 3
- OpenMP scheduling modes
- SkePU memory consistency model
- Variadic tuner prototype
- High-level skeleton fusion
- Limitations and future work
- Limitations
- Applicability of data-parallel patterns
- Dynamic data structures
- Limitations of language embedding
- Future work
- Further backend targets: reconfigurable accelerators
- Extending the parallel pattern set: stream parallelization
- Testing, debugging, and visualization
- Higher-level language interface
- Conclusions
- Bibliography.
- Additions and changes from the licentiate thesis
- New contributions
- Other changes
- Definitions
- Abbreviations
- Domain-specific terminology
- SkePU-specific terminology
- SkePU-BLAS API
- Application source code samples
- N-body simulation
- Game of life
- Conjugate gradient
- CO2 capture
- Dr-sammanst.