Synopses & Reviews
Write High-Performance, Highly Scalable Multicore Applications for Leading Platforms
Multicore Application Programming is a comprehensive, practical guide to high-performance multicore programming that any experienced developer can use.
Author Darryl Gove covers the leading approaches to parallelization on Windows, Linux, and Oracle Solaris. Through practical examples, he illuminates the challenges involved in writing applications that fully utilize multicore processors, helping you produce applications that are functionally correct, offer superior performance, and scale well to eight cores, sixteen cores, and beyond.
The book reveals how specific hardware implementations impact application performance and shows how to avoid common pitfalls. Step by step, you’ll write applications that can handle large numbers of parallel threads, and you’ll master advanced parallelization techniques. You’ll learn how to
- Identify your best opportunities to use parallelism
- Share data safely between multiple threads
- Write applications using POSIX or Windows threads
- Hand-code synchronization and sharing
- Take advantage of automatic parallelization and OpenMP
- Overcome common obstacles to scaling
- Apply new approaches to writing correct, fast, scalable parallel code
Multicore Application Programming isn’t wedded to a single approach or platform: It is for every experienced C programmer working with any contemporary multicore processor in any leading operating system environment.
About the Author
Darryl Gove is a senior principal software engineer in the Oracle Solaris Studio compiler team. He works on the analysis, parallelization, and optimization of both applications and benchmarks. Darryl has a master’s degree and a doctorate in operational research from the University of Southampton, UK. He is the author of the books Solaris Application Programming (Prentice Hall, 2008) and The Developer’s Edge (Sun Microsystems, 2009). He writes regularly about optimization and coding and maintains a blog at www.darrylgove.com.
Table of Contents
Preface xv Acknowledgments xix
About the Author xxi
Chapter 1: Hardware, Processes, and Threads 1
Examining the Insides of a Computer 1
The Motivation for Multicore Processors 3
The Characteristics of Multiprocessor Systems 18
The Translation of Source Code to Assembly Language 21
Summary 29
Chapter 2: Coding for Performance 31
Defining Performance 31
Understanding Algorithmic Complexity 33
How Structure Impacts Performance 39
The Role of the Compiler 60
Identifying Where Time Is Spent Using Profiling 74
How Not to Optimize 80
Performance by Design 82
Summary 83
Chapter 3: Identifying Opportunities for Parallelism 85
Using Multiple Processes to Improve System Productivity 85
Multiple Users Utilizing a Single System 87
Improving Machine Efficiency Through Consolidation 88
Using Parallelism to Improve the Performance of a Single Task 92
Parallelization Patterns 100
How Dependencies Influence the Ability Run Code in Parallel 110
Identifying Parallelization Opportunities 118
Summary 119
Chapter 4: Synchronization and Data Sharing 121
Data Races 121
Synchronization Primitives 126
Deadlocks and Livelocks 132
Communication Between Threads and Processes 133
Storing Thread-Private Data 141
Summary 142
Chapter 5: Using POSIX Threads 143
Creating Threads 143
Compiling Multithreaded Code 151
Process Termination 153
Sharing Data Between Threads 154
Variables and Memory 175
Multiprocess Programming 179
Sockets 193
Reentrant Code and Compiler Flags 197
Summary 198
Chapter 6: Windows Threading 199
Creating Native Windows Threads 199
Methods of Synchronization and Resource Sharing 208
Wide String Handling in Windows 221
Creating Processes 222
Atomic Updates of Variables 238
Allocating Thread-Local Storage 240
Setting Thread Priority 242
Summary 244
Chapter 7: Using Automatic Parallelization and OpenMP 245
Using Automatic Parallelization to Produce a Parallel Application 245
Using OpenMP to Produce a Parallel Application 256
Ensuring That Code in a Parallel Region Is Executed in Order 285
Collapsing Loops to Improve Workload Balance 286
Enforcing Memory Consistency 287
An Example of Parallelization 288
Summary 293
Chapter 8: Hand-Coded Synchronization and Sharing 295
Atomic Operations 295
Operating System–Provided Atomics 309
Lockless Algorithms 312
Summary 332
Chapter 9: Scaling with Multicore Processors 333
Constraints to Application Scaling 333
Hardware Constraints to Scaling 352
Operating System Constraints to Scaling 369
Multicore Processors and Scaling 380
Summary 381
Chapter 10: Other Parallelization Technologies 383
GPU-Based Computing 383
Language Extensions 386
Alternative Languages 399
Clustering Technologies 402
Transactional Memory 407
Vectorization 408
Summary 409
Chapter 11: Concluding Remarks 411
Writing Parallel Applications 411
Parallel Code on Multicore Processors 414
The Future 416
Bibliography 417
Index 419