Table of Contents: Pro TBB :

Pro TBB : C++ Parallel Programming with Threading Building Blocks.

Bibliographic Details
Main Author:	Voss, Michael.
Other Authors:	Asenjo, Rafael., Reinders, James.
Format:	eBook
Language:	English
Published:	Berkeley, CA : Apress L. P., 2019.
Edition:	1st ed.
Subjects:	Electronic books.
Online Access:	Click to View

Table of Contents:

Intro
Table of Contents
About the Authors
Acknowledgments
Preface
Part 1
Chapter 1: Jumping Right In: "Hello, TBB!"
Why Threading Building Blocks?
Performance: Small Overhead, Big Benefits for C++
Evolving Support for Parallelism in TBB and C++
Recent C++ Additions for Parallelism
The Threading Building Blocks (TBB) Library
Parallel Execution Interfaces
Interfaces That Are Independent of the Execution Model
Using the Building Blocks in TBB
Let's Get Started Already!
Getting the Threading Building Blocks (TBB) Library
Getting a Copy of the Examples
Writing a First "Hello, TBB!" Example
Building the Simple Examples
Steps to Set Up an Environment
Building on Windows Using Microsoft Visual Studio
Building on a Linux Platform from a Terminal
Using the Intel Compiler
tbbvars and pstlvars Scripts
Setting Up Variables Manually Without Using the tbbvars Script or the Intel Compiler
A More Complete Example
Starting with a Serial Implementation
Adding a Message-Driven Layer Using a Flow Graph
Adding a Fork-Join Layer Using a parallel_for
Adding a SIMD Layer Using a Parallel STL Transform
Summary
Chapter 2: Generic Parallel Algorithms
Functional / Task Parallelism
A Slightly More Complicated Example: A Parallel Implementation of Quicksort
Loops: parallel_for, parallel_reduce, and parallel_scan
parallel_for: Applying a Body to Each Element in a Range
A Slightly More Complicated Example: Parallel Matrix Multiplication
parallel_reduce: Calculating a Single Result Across a Range
A Slightly More Complicated Example: Calculating π by Numerical Integration
parallel_scan: A Reduction with Intermediate Values
How Does This Work?
A Slightly More Complicated Example: Line of Sight
Cook Until Done: parallel_do and parallel_pipeline.
parallel_do: Apply a Body Until There Are No More Items Left
A Slightly More Complicated Example: Forward Substitution
parallel_pipeline: Streaming Items Through a Series of Filters
A Slightly More Complicated Example: Creating 3D Stereoscopic Images
Summary
For More Information
Chapter 3: Flow Graphs
Why Use Graphs to Express Parallelism?
The Basics of the TBB Flow Graph Interface
Step 1: Create the Graph Object
Step 2: Make the Nodes
Step 3: Add Edges
Step 4: Start the Graph
Step 5: Wait for the Graph to Complete Executing
A More Complicated Example of a Data Flow Graph
Implementing the Example as a TBB Flow Graph
Understanding the Performance of a Data Flow Graph
The Special Case of Dependency Graphs
Implementing a Dependency Graph
Estimating the Scalability of a Dependency Graph
Advanced Topics in TBB Flow Graphs
Summary
Chapter 4: TBB and the Parallel Algorithms of the C++ Standard Template Library
Does the C++ STL Library Belong in This Book?
A Parallel STL Execution Policy Analogy
A Simple Example Using std::for_each
What Algorithms Are Provided in a Parallel STL Implementation?
How to Get and Use a Copy of Parallel STL That Uses TBB
Algorithms in Intel's Parallel STL
Capturing More Use Cases with Custom Iterators
Highlighting Some of the Most Useful Algorithms
std::for_each, std::for_each_n
std::transform
std::reduce
std::transform_reduce
A Deeper Dive into the Execution Policies
The sequenced_policy
The parallel_policy
The unsequenced_policy
The parallel_unsequenced_policy
Which Execution Policy Should We Use?
Other Ways to Introduce SIMD Parallelism
Summary
For More Information
Chapter 5: Synchronization: Why and How to Avoid It
A Running Example: Histogram of an Image
An Unsafe Parallel Implementation.
A First Safe Parallel Implementation: Coarse-Grained Locking
Mutex Flavors
A Second Safe Parallel Implementation: Fine-Grained Locking
A Third Safe Parallel Implementation: Atomics
A Better Parallel Implementation: Privatization and Reduction
Thread Local Storage, TLS
enumerable_thread_specific, ETS
combinable
The Easiest Parallel Implementation: Reduction Template
Recap of Our Options
Summary
For More Information
Chapter 6: Data Structures for Concurrency
Key Data Structures Basics
Unordered Associative Containers
Map vs. Set
Multiple Values
Hashing
Unordered
Concurrent Containers
Concurrent Unordered Associative Containers
concurrent_hash_map
Concurrent Support for map/multimap and set/multiset Interfaces
Built-In Locking vs. No Visible Locking
Iterating Through These Structures Is Asking for Trouble
Concurrent Queues: Regular, Bounded, and Priority
Bounding Size
Priority Ordering
Staying Thread-Safe: Try to Forget About Top, Size, Empty, Front, Back
Iterators
Why to Use This Concurrent Queue: The A-B-A Problem
When to NOT Use Queues: Think Algorithms!
Concurrent Vector
When to Use tbb::concurrent_vector Instead of std::vector
Elements Never Move
Concurrent Growth of concurrent_vectors
Summary
Chapter 7: Scalable Memory Allocation
Modern C++ Memory Allocation
Scalable Memory Allocation: What
Scalable Memory Allocation: Why
Avoiding False Sharing with Padding
Scalable Memory Allocation Alternatives: Which
Compilation Considerations
Most Popular Usage (C/C++ Proxy Library): How
Linux: malloc/new Proxy Library Usage
macOS: malloc/new Proxy Library Usage
Windows: malloc/new Proxy Library Usage
Testing our Proxy Library Usage
C Functions: Scalable Memory Allocators for C.
C++ Classes: Scalable Memory Allocators for C++
Allocators with std::allocator&lt
T&gt
Signature
scalable_allocator
tbb_allocator
zero_allocator
cached_aligned_allocator
Memory Pool Support: memory_pool_allocator
Array Allocation Support: aligned_space
Replacing new and delete Selectively
Performance Tuning: Some Control Knobs
What Are Huge Pages?
TBB Support for Huge Pages
scalable_allocation_mode(int mode, intptr_t value)
TBBMALLOC_USE_HUGE_PAGES
TBBMALLOC_SET_SOFT_HEAP_LIMIT
int scalable_allocation_command(int cmd, void ∗param)
TBBMALLOC_CLEAN_ALL_BUFFERS
TBBMALLOC_CLEAN_THREAD_BUFFERS
Summary
Chapter 8: Mapping Parallel Patterns to TBB
Parallel Patterns vs. Parallel Algorithms
Patterns Categorize Algorithms, Designs, etc.
Patterns That Work
Data Parallelism Wins
Nesting Pattern
Map Pattern
Workpile Pattern
Reduction Patterns (Reduce and Scan)
Fork-Join Pattern
Divide-and-Conquer Pattern
Branch-and-Bound Pattern
Pipeline Pattern
Event-Based Coordination Pattern (Reactive Streams)
Summary
For More Information
Part 2
Chapter 9: The Pillars of Composability
What Is Composability?
Nested Composition
Concurrent Composition
Serial Composition
The Features That Make TBB a Composable Library
The TBB Thread Pool (the Market) and Task Arenas
The TBB Task Dispatcher: Work Stealing and More
Putting It All Together
Looking Forward
Controlling the Number of Threads
Work Isolation
Task-to-Thread and Thread-to-Core Affinity
Task Priorities
Summary
For More Information
Chapter 10: Using Tasks to Create Your Own Algorithms
A Running Example: The Sequence
The High-Level Approach: parallel_invoke
The Highest Among the Lower: task_group
The Low-Level Task Interface: Part One - Task Blocking.
The Low-Level Task Interface: Part Two - Task Continuation
Bypassing the Scheduler
The Low-Level Task Interface: Part Three - Task Recycling
Task Interface Checklist
One More Thing: FIFO (aka Fire-and-Forget) Tasks
Putting These Low-Level Features to Work
Summary
For More Information
Chapter 11: Controlling the Number of Threads Used for Execution
A Brief Recap of the TBB Scheduler Architecture
Interfaces for Controlling the Number of Threads
Controlling Thread Count with task_scheduler_init
Controlling Thread Count with task_arena
Controlling Thread Count with global_control
Summary of Concepts and Classes
The Best Approaches for Setting the Number of Threads
Using a Single task_scheduler_init Object for a Simple Application
Using More Than One task_scheduler_init Object in a Simple Application
Using Multiple Arenas with Different Numbers of Slots to Influence Where TBB Places Its Worker Threads
Using global_control to Control How Many Threads Are Available to Fill Arena Slots
Using global_control to Temporarily Restrict the Number of Available Threads
When NOT to Control the Number of Threads
Figuring Out What's Gone Wrong
Summary
Chapter 12: Using Work Isolation for Correctness and Performance
Work Isolation for Correctness
Creating an Isolated Region with this_task_arena::isolate
Oh No! Work Isolation Can Cause Its Own Correctness Issues!
Even When It Is Safe, Work Isolation Is Not Free
Using Task Arenas for Isolation: A Double-Edged Sword
Don't Be Tempted to Use task_arenas to Create Work Isolation for Correctness
Summary
For More Information
Chapter 13: Creating Thread-to-Core and Task-to-Thread Affinity
Creating Thread-to-Core Affinity
Creating Task-to-Thread Affinity
When and How Should We Use the TBB Affinity Features?
Summary.
For More Information.

Pro TBB : C++ Parallel Programming with Threading Building Blocks.

Similar Items