Pattern-Based Programming Abstractions for Heterogeneous Parallel Computing.

Bibliographic Details
Main Author: Ernstsson, August.
Format: eBook
Language:English
Published: Linköping : Linkopings Universitet, 2021.
Edition:1st ed.
Series:Linköping Studies in Science and Technology. Dissertations Series
Subjects:
Online Access:Click to View
Table of Contents:
  • Intro
  • Populärvetenskaplig sammanfattning
  • Abstract
  • Acknowledgments
  • Contents
  • Introduction
  • Aims and research questions
  • Published work behind this thesis
  • Other work behind this thesis
  • Structure
  • Background and related work
  • Motivation
  • High-level parallel programming
  • Skeleton programming
  • Related work
  • GrPPI
  • Musket
  • Kokkos
  • SYCL
  • MLIR
  • StarPU
  • C++ AMP, and other industry efforts
  • Other related frameworks, libraries, and toolchains
  • Independent surveys
  • Earlier related work on SkePU
  • SkePU overview
  • Basic constructs
  • Backend architecture
  • History
  • SkePU 2 design principles
  • SkePU 3 design principles
  • Skeleton set
  • Skeleton set
  • Map skeleton
  • Freely accessible containers inside user functions
  • Variadic type signatures
  • Multi-valued return
  • Index-dependent computations
  • MapPairs skeleton
  • MapOverlap skeleton
  • Edge handling modes
  • Update modes
  • Reduce skeleton
  • One-dimensional reductions
  • Two-dimensional reductions
  • Scan skeleton
  • MapReduce skeleton
  • MapPairsReduce skeleton
  • Call skeleton
  • User functions
  • User functions as lambda expressions
  • User types
  • User constants
  • Strided skeletons
  • Strides Map, MapPairs, and their reduce variants
  • Strides in MapOverlap
  • Data representation with smart data-containers
  • Smart data-containers
  • Container indexing
  • Container proxies
  • MatRow proxy
  • MatCol proxy
  • Region proxy
  • Memory consistency model
  • External scope
  • Standard library
  • Deterministic random number generation
  • Complex numbers
  • Linear algebra
  • Image filtering and visualization
  • Benchmark utilities
  • High-level consistent input and output
  • General utilities
  • Implementation
  • Implementation overview
  • Language embedding and type safety
  • Improved type safety from SkePU 1
  • Source-to-source compiler.
  • Backends
  • Sequential CPU backend
  • Multi-core CPU backend: OpenMP
  • GPU backends: OpenCL and CUDA
  • C and Fortran language bindings
  • Continuous integration and testing
  • Dependencies
  • Availability
  • Hybrid CPU-GPU skeleton execution
  • Introduction
  • Workload partitioning and implementation
  • StarPU backend implementation
  • Auto-tuning
  • Skeleton programming on large-scale cluster systems
  • Background
  • StarPU-MPI backend
  • GPI backend
  • GASPI and GPI
  • Implementation
  • Design
  • Synchonization and state tracking
  • Consistency model and double buffering
  • Communication pattern
  • Data representation
  • Data transfers and caching
  • Conclusions
  • Extending smart data-containers for data locality awareness
  • Introduction
  • Large-scale data processing with MapReduce and Spark
  • MapReduce
  • Spark
  • Lazily evaluated skeletons with tiling
  • Basic approach and benefits
  • Backend selection
  • Loop optimization
  • Evaluation points
  • Further application areas
  • Implementation
  • Lazy tiling for stencil computations
  • Applications and comparison to kernel fusion
  • Polynomial evaluation using Horner's method
  • Exponentiation by repeated squaring
  • Heat propagation
  • Related work
  • High-level skeleton fusion
  • Comparison to lineages
  • Kernel fusion
  • Types of fusions
  • Example: N-body simulation
  • Future work
  • Multi-variant user functions
  • Introduction
  • Idea and implementation
  • Use cases
  • Vectorization example
  • Generalized multi-variant components with the Call skeleton
  • Other use cases
  • Related work
  • A deterministic portable parallel pseudo-random number generator
  • Introduction
  • Determinism in heterogeneous parallel computing
  • Parallel pseudo-random number generation
  • Previous manual parallelization of PRNG in SkePU programs
  • Monte Carlo pi calculation-index-based scrambling.
  • Markov Chain Monte Carlo methods in LQCD-PRNG with explicit state
  • Designing a deterministic PRNG for SkePU
  • Global synchronization
  • Stream splitting
  • State forwarding
  • Optimizing long or iterated skeleton chains by pre-forwarding
  • API extension design
  • Related work
  • Towards a modernized auto-tuner
  • Background
  • SkePU variadic tuner design
  • Implementation
  • Multi-dimensional argument sequences
  • Sampler
  • Execution plan and persistence
  • Future work
  • Evaluation results
  • SkePU usability evaluation
  • SkePU 2 prototype survey
  • SkePU 3 survey
  • Initial SkePU 2 performance evaluation
  • Performance evaluation of lineages
  • Sequences of Maps
  • Heat propagation
  • Hybrid backend
  • Single skeleton evaluation
  • Generic application evaluation
  • Comparison to dynamic hybrid scheduling using StarPU
  • Evaluation of multi-variant user functions
  • Vectorization
  • Median filtering
  • Application benchmarks of SkePU 3
  • Libsolve ODE solver
  • N-body
  • Blackscholes and Streamcluster
  • Brain simulation
  • CO2 capture
  • Supercapacitor simulation
  • Conjugate gradient
  • Experimental evaluation of deterministic PRNG
  • Monte-Carlo Pi approximation
  • LQCD Mini-Application
  • Miller-Rabin primality testing
  • Natural noise generation
  • Programmability evaluation
  • SkePU-GPI cluster backend
  • Microbenchmarks of SkePU 3
  • OpenMP scheduling modes
  • SkePU memory consistency model
  • Variadic tuner prototype
  • High-level skeleton fusion
  • Limitations and future work
  • Limitations
  • Applicability of data-parallel patterns
  • Dynamic data structures
  • Limitations of language embedding
  • Future work
  • Further backend targets: reconfigurable accelerators
  • Extending the parallel pattern set: stream parallelization
  • Testing, debugging, and visualization
  • Higher-level language interface
  • Conclusions
  • Bibliography.
  • Additions and changes from the licentiate thesis
  • New contributions
  • Other changes
  • Definitions
  • Abbreviations
  • Domain-specific terminology
  • SkePU-specific terminology
  • SkePU-BLAS API
  • Application source code samples
  • N-body simulation
  • Game of life
  • Conjugate gradient
  • CO2 capture
  • Dr-sammanst.