Figures xv Tables xxi
Listings xxv
Foreword xxix
Preface xxxiii
Acknowledgments xli
About the Authors xliii
Part I: The OpenCL 1.1 Language and API 1
Chapter 1: An Introduction to OpenCL 3
What Is OpenCL, or . . . Why You Need This Book 3
Our Many-Core Future: Heterogeneous Platforms 4
Software in a Many-Core World 7
Conceptual Foundations of OpenCL 11
OpenCL and Graphics 29
The Contents of OpenCL 30
The Embedded Profile 35
Learning OpenCL 36
Chapter 2: HelloWorld: An OpenCL Example 39
Building the Examples 40
HelloWorld Example 45
Checking for Errors in OpenCL 57
Chapter 3: Platforms, Contexts, and Devices 63
OpenCL Platforms 63
OpenCL Devices 68
OpenCL Contexts 83
Chapter 4: Programming with OpenCL C 97
Writing a Data-Parallel Kernel Using OpenCL C 97
Scalar Data Types 99
Vector Data Types 102
Other Data Types 108
Derived Types 109
Implicit Type Conversions 110
Explicit Casts 116
Explicit Conversions 117
Reinterpreting Data as Another Type 121
Vector Operators 123
Qualifiers 133
Keywords 141
Preprocessor Directives and Macros 141
Restrictions 146
Chapter 5: OpenCL C Built-In Functions 149
Work-Item Functions 150
Math Functions 153
Integer Functions 168
Common Functions 172
Geometric Functions 175
Relational Functions 175
Vector Data Load and Store Functions 181
Synchronization Functions 190
Async Copy and Prefetch Functions 191
Atomic Functions 195
Miscellaneous Vector Functions 199
Image Read and Write Functions 201
Chapter 6: Programs and Kernels 217
Program and Kernel Object Overview 217
Program Objects 218
Kernel Objects 237
Chapter 7: Buffers and Sub-Buffers 247
Memory Objects, Buffers, and Sub-Buffers Overview 247
Creating Buffers and Sub-Buffers 249
Querying Buffers and Sub-Buffers 257
Reading, Writing, and Copying Buffers and Sub-Buffers 259
Mapping Buffers and Sub-Buffers 276
Chapter 8: Images and Samplers 281
Image and Sampler Object Overview 281
Creating Image Objects 283
Creating Sampler Objects 292
OpenCL C Functions for Working with Images 295
Transferring Image Objects 299
Chapter 9: Events 309
Commands, Queues, and Events Overview 309
Events and Command-Queues 311
Event Objects 317
Generating Events on the Host 321
Events Impacting Execution on the Host 322
Using Events for Profiling 327
Events Inside Kernels 332
Events from Outside OpenCL 333
Chapter 10: Interoperability with OpenGL 335
OpenCL/OpenGL Sharing Overview 335
Querying for the OpenGL Sharing Extension 336
Initializing an OpenCL Context for OpenGL Interoperability 338
Creating OpenCL Buffers from OpenGL Buffers 339
Creating OpenCL Image Objects from OpenGL Textures 344
Querying Information about OpenGL Objects 347
Synchronization between OpenGL and OpenCL 348
Chapter 11: Interoperability with Direct3D 353
Direct3D/OpenCL Sharing Overview 353
Initializing an OpenCL Context for Direct3D Interoperability 354
Creating OpenCL Memory Objects from Direct3D Buffers and Textures 357
Acquiring and Releasing Direct3D Objects in OpenCL 361
Processing a Direct3D Texture in OpenCL 363
Processing D3D Vertex Data in OpenCL 366
Chapter 12: C++ Wrapper API 369
C++ Wrapper API Overview 369
C++ Wrapper API Exceptions 371
Vector Add Example Using the C++ Wrapper API 374
Chapter 13: OpenCL Embedded Profile 383
OpenCL Profile Overview 383
64-Bit Integers 385
Images 386
Built-In Atomic Functions 387
Mandated Minimum Single-Precision Floating-Point Capabilities 387
Determining the Profile Supported by a Device in an OpenCL C Program 390
Part II: OpenCL 1.1 Case Studies 391
Chapter 14: Image Histogram 393
Computing an Image Histogram 393
Parallelizing the Image Histogram 395
Additional Optimizations to the Parallel Image Histogram 400
Computing Histograms with Half-Float or Float Values for Each Channel 403
Chapter 15: Sobel Edge Detection Filter 407
What Is a Sobel Edge Detection Filter? 407
Implementing the Sobel Filter as an OpenCL Kernel 407
Chapter 16: Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm 411
Graph Data Structures 412
Kernels 414
Leveraging Multiple Compute Devices 417
Chapter 17: Cloth Simulation in the Bullet Physics SDK 425
An Introduction to Cloth Simulation 425
Simulating the Soft Body 429
Executing the Simulation on the CPU 431
Changes Necessary for Basic GPU Execution 432
Two-Layered Batching 438
Optimizing for SIMD Computation and Local Memory 441
Adding OpenGL Interoperation 446
Chapter 18: Simulating the Ocean with Fast Fourier Transform 449
An Overview of the Ocean Application 450
Phillips Spectrum Generation 453
An OpenCL Discrete Fourier Transform 457
A Closer Look at the FFT Kernel 463
A Closer Look at the Transpose Kernel 467
Chapter 19: Optical Flow 469
Optical Flow Problem Overview 469
Sub-Pixel Accuracy with Hardware Linear Interpolation 480
Application of the Texture Cache 480
Using Local Memory 481
Early Exit and Hardware Scheduling 483
Efficient Visualization with OpenGL Interop 483
Performance 484
Chapter 20: Using OpenCL with PyOpenCL 487
Introducing PyOpenCL 487
Running the PyImageFilter2D Example 488
PyImageFilter2D Code 488
Context and Command-Queue Creation 492
Loading to an Image Object 493
Creating and Building a Program 494
Setting Kernel Arguments and Executing a Kernel 495
Reading the Results 496
Chapter 21: Matrix Multiplication with OpenCL 499
The Basic Matrix Multiplication Algorithm 499
A Direct Translation into OpenCL 501
Increasing the Amount of Work per Kernel 506
Optimizing Memory Movement: Local Memory 509
Performance Results and Optimizing the Original CPU Code 511
Chapter 22: Sparse Matrix-Vector Multiplication 515
Sparse Matrix-Vector Multiplication (SpMV) Algorithm 515
Description of This Implementation 518
Tiled and Packetized Sparse Matrix Representation 519
Header Structure 522
Tiled and Packetized Sparse Matrix Design Considerations 523
Optional Team Information 524
Tested Hardware Devices and Results 524
Additional Areas of Optimization 538
Appendix: Summary of OpenCL 1.1 541
The OpenCL Platform Layer 541
The OpenCL Runtime 543
Buffer Objects 544
Program Objects 546
Kernel and Event Objects 547
Supported Data Types 550
Vector Component Addressing 552
Preprocessor Directives and Macros 555
Specify Type Attributes 555
Math Constants 556
Work-Item Built-In Functions 557
Integer Built-In Functions 557
Common Built-In Functions 559
Math Built-In Functions 560
Geometric Built-In Functions 563
Relational Built-In Functions 564
Vector Data Load/Store Functions 567
Atomic Functions 568
Async Copies and Prefetch Functions 570
Synchronization, Explicit Memory Fence 570
Miscellaneous Vector Built-In Functions 571
Image Read and Write Built-In Functions 572
Image Objects 573
Image Formats 576
Access Qualifiers 576
Sampler Objects 576
Sampler Declaration Fields 577
OpenCL Device Architecture Diagram 577
OpenCL/OpenGL Sharing APIs 577
OpenCL/Direct3D 10 Sharing APIs 579
Index 581