Pro TBB : C++ Parallel Programming with Threading Building Blocks.

Bibliographic Details
Main Author: Voss, Michael.
Other Authors: Asenjo, Rafael., Reinders, James.
Format: eBook
Language:English
Published: Berkeley, CA : Apress L. P., 2019.
Edition:1st ed.
Subjects:
Online Access:Click to View
Table of Contents:
  • Intro
  • Table of Contents
  • About the Authors
  • Acknowledgments
  • Preface
  • Part 1
  • Chapter 1: Jumping Right In: "Hello, TBB!"
  • Why Threading Building Blocks?
  • Performance: Small Overhead, Big Benefits for C++
  • Evolving Support for Parallelism in TBB and C++
  • Recent C++ Additions for Parallelism
  • The Threading Building Blocks (TBB) Library
  • Parallel Execution Interfaces
  • Interfaces That Are Independent of the Execution Model
  • Using the Building Blocks in TBB
  • Let's Get Started Already!
  • Getting the Threading Building Blocks (TBB) Library
  • Getting a Copy of the Examples
  • Writing a First "Hello, TBB!" Example
  • Building the Simple Examples
  • Steps to Set Up an Environment
  • Building on Windows Using Microsoft Visual Studio
  • Building on a Linux Platform from a Terminal
  • Using the Intel Compiler
  • tbbvars and pstlvars Scripts
  • Setting Up Variables Manually Without Using the tbbvars Script or the Intel Compiler
  • A More Complete Example
  • Starting with a Serial Implementation
  • Adding a Message-Driven Layer Using a Flow Graph
  • Adding a Fork-Join Layer Using a parallel_for
  • Adding a SIMD Layer Using a Parallel STL Transform
  • Summary
  • Chapter 2: Generic Parallel Algorithms
  • Functional / Task Parallelism
  • A Slightly More Complicated Example: A Parallel Implementation of Quicksort
  • Loops: parallel_for, parallel_reduce, and parallel_scan
  • parallel_for: Applying a Body to Each Element in a Range
  • A Slightly More Complicated Example: Parallel Matrix Multiplication
  • parallel_reduce: Calculating a Single Result Across a Range
  • A Slightly More Complicated Example: Calculating π by Numerical Integration
  • parallel_scan: A Reduction with Intermediate Values
  • How Does This Work?
  • A Slightly More Complicated Example: Line of Sight
  • Cook Until Done: parallel_do and parallel_pipeline.
  • parallel_do: Apply a Body Until There Are No More Items Left
  • A Slightly More Complicated Example: Forward Substitution
  • parallel_pipeline: Streaming Items Through a Series of Filters
  • A Slightly More Complicated Example: Creating 3D Stereoscopic Images
  • Summary
  • For More Information
  • Chapter 3: Flow Graphs
  • Why Use Graphs to Express Parallelism?
  • The Basics of the TBB Flow Graph Interface
  • Step 1: Create the Graph Object
  • Step 2: Make the Nodes
  • Step 3: Add Edges
  • Step 4: Start the Graph
  • Step 5: Wait for the Graph to Complete Executing
  • A More Complicated Example of a Data Flow Graph
  • Implementing the Example as a TBB Flow Graph
  • Understanding the Performance of a Data Flow Graph
  • The Special Case of Dependency Graphs
  • Implementing a Dependency Graph
  • Estimating the Scalability of a Dependency Graph
  • Advanced Topics in TBB Flow Graphs
  • Summary
  • Chapter 4: TBB and the Parallel Algorithms of the C++ Standard Template Library
  • Does the C++ STL Library Belong in This Book?
  • A Parallel STL Execution Policy Analogy
  • A Simple Example Using std::for_each
  • What Algorithms Are Provided in a Parallel STL Implementation?
  • How to Get and Use a Copy of Parallel STL That Uses TBB
  • Algorithms in Intel's Parallel STL
  • Capturing More Use Cases with Custom Iterators
  • Highlighting Some of the Most Useful Algorithms
  • std::for_each, std::for_each_n
  • std::transform
  • std::reduce
  • std::transform_reduce
  • A Deeper Dive into the Execution Policies
  • The sequenced_policy
  • The parallel_policy
  • The unsequenced_policy
  • The parallel_unsequenced_policy
  • Which Execution Policy Should We Use?
  • Other Ways to Introduce SIMD Parallelism
  • Summary
  • For More Information
  • Chapter 5: Synchronization: Why and How to Avoid It
  • A Running Example: Histogram of an Image
  • An Unsafe Parallel Implementation.
  • A First Safe Parallel Implementation: Coarse-Grained Locking
  • Mutex Flavors
  • A Second Safe Parallel Implementation: Fine-Grained Locking
  • A Third Safe Parallel Implementation: Atomics
  • A Better Parallel Implementation: Privatization and Reduction
  • Thread Local Storage, TLS
  • enumerable_thread_specific, ETS
  • combinable
  • The Easiest Parallel Implementation: Reduction Template
  • Recap of Our Options
  • Summary
  • For More Information
  • Chapter 6: Data Structures for Concurrency
  • Key Data Structures Basics
  • Unordered Associative Containers
  • Map vs. Set
  • Multiple Values
  • Hashing
  • Unordered
  • Concurrent Containers
  • Concurrent Unordered Associative Containers
  • concurrent_hash_map
  • Concurrent Support for map/multimap and set/multiset Interfaces
  • Built-In Locking vs. No Visible Locking
  • Iterating Through These Structures Is Asking for Trouble
  • Concurrent Queues: Regular, Bounded, and Priority
  • Bounding Size
  • Priority Ordering
  • Staying Thread-Safe: Try to Forget About Top, Size, Empty, Front, Back
  • Iterators
  • Why to Use This Concurrent Queue: The A-B-A Problem
  • When to NOT Use Queues: Think Algorithms!
  • Concurrent Vector
  • When to Use tbb::concurrent_vector Instead of std::vector
  • Elements Never Move
  • Concurrent Growth of concurrent_vectors
  • Summary
  • Chapter 7: Scalable Memory Allocation
  • Modern C++ Memory Allocation
  • Scalable Memory Allocation: What
  • Scalable Memory Allocation: Why
  • Avoiding False Sharing with Padding
  • Scalable Memory Allocation Alternatives: Which
  • Compilation Considerations
  • Most Popular Usage (C/C++ Proxy Library): How
  • Linux: malloc/new Proxy Library Usage
  • macOS: malloc/new Proxy Library Usage
  • Windows: malloc/new Proxy Library Usage
  • Testing our Proxy Library Usage
  • C Functions: Scalable Memory Allocators for C.
  • C++ Classes: Scalable Memory Allocators for C++
  • Allocators with std::allocator&lt
  • T&gt
  • Signature
  • scalable_allocator
  • tbb_allocator
  • zero_allocator
  • cached_aligned_allocator
  • Memory Pool Support: memory_pool_allocator
  • Array Allocation Support: aligned_space
  • Replacing new and delete Selectively
  • Performance Tuning: Some Control Knobs
  • What Are Huge Pages?
  • TBB Support for Huge Pages
  • scalable_allocation_mode(int mode, intptr_t value)
  • TBBMALLOC_USE_HUGE_PAGES
  • TBBMALLOC_SET_SOFT_HEAP_LIMIT
  • int scalable_allocation_command(int cmd, void ∗param)
  • TBBMALLOC_CLEAN_ALL_BUFFERS
  • TBBMALLOC_CLEAN_THREAD_BUFFERS
  • Summary
  • Chapter 8: Mapping Parallel Patterns to TBB
  • Parallel Patterns vs. Parallel Algorithms
  • Patterns Categorize Algorithms, Designs, etc.
  • Patterns That Work
  • Data Parallelism Wins
  • Nesting Pattern
  • Map Pattern
  • Workpile Pattern
  • Reduction Patterns (Reduce and Scan)
  • Fork-Join Pattern
  • Divide-and-Conquer Pattern
  • Branch-and-Bound Pattern
  • Pipeline Pattern
  • Event-Based Coordination Pattern (Reactive Streams)
  • Summary
  • For More Information
  • Part 2
  • Chapter 9: The Pillars of Composability
  • What Is Composability?
  • Nested Composition
  • Concurrent Composition
  • Serial Composition
  • The Features That Make TBB a Composable Library
  • The TBB Thread Pool (the Market) and Task Arenas
  • The TBB Task Dispatcher: Work Stealing and More
  • Putting It All Together
  • Looking Forward
  • Controlling the Number of Threads
  • Work Isolation
  • Task-to-Thread and Thread-to-Core Affinity
  • Task Priorities
  • Summary
  • For More Information
  • Chapter 10: Using Tasks to Create Your Own Algorithms
  • A Running Example: The Sequence
  • The High-Level Approach: parallel_invoke
  • The Highest Among the Lower: task_group
  • The Low-Level Task Interface: Part One - Task Blocking.
  • The Low-Level Task Interface: Part Two - Task Continuation
  • Bypassing the Scheduler
  • The Low-Level Task Interface: Part Three - Task Recycling
  • Task Interface Checklist
  • One More Thing: FIFO (aka Fire-and-Forget) Tasks
  • Putting These Low-Level Features to Work
  • Summary
  • For More Information
  • Chapter 11: Controlling the Number of Threads Used for Execution
  • A Brief Recap of the TBB Scheduler Architecture
  • Interfaces for Controlling the Number of Threads
  • Controlling Thread Count with task_scheduler_init
  • Controlling Thread Count with task_arena
  • Controlling Thread Count with global_control
  • Summary of Concepts and Classes
  • The Best Approaches for Setting the Number of Threads
  • Using a Single task_scheduler_init Object for a Simple Application
  • Using More Than One task_scheduler_init Object in a Simple Application
  • Using Multiple Arenas with Different Numbers of Slots to Influence Where TBB Places Its Worker Threads
  • Using global_control to Control How Many Threads Are Available to Fill Arena Slots
  • Using global_control to Temporarily Restrict the Number of Available Threads
  • When NOT to Control the Number of Threads
  • Figuring Out What's Gone Wrong
  • Summary
  • Chapter 12: Using Work Isolation for Correctness and Performance
  • Work Isolation for Correctness
  • Creating an Isolated Region with  this_task_arena::isolate
  • Oh No! Work Isolation Can Cause Its Own Correctness Issues!
  • Even When It Is Safe, Work Isolation Is Not Free
  • Using Task Arenas for Isolation: A Double-Edged Sword
  • Don't Be Tempted to Use task_arenas to Create Work Isolation for Correctness
  • Summary
  • For More Information
  • Chapter 13: Creating Thread-to-Core and Task-to-Thread Affinity
  • Creating Thread-to-Core Affinity
  • Creating Task-to-Thread Affinity
  • When and How Should We Use the TBB Affinity Features?
  • Summary.
  • For More Information.