https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/contents/
- Parallel programming
HW:
- grid computing
- Cluster server
- Symmetric multiprocessor system (SMP) - identical processors in powers of 2
- Multi-core processor - heterogeneous system
Flynn'sTaxonomy
- SISD
- SIMD
- MISD
- MIMD
- Distributed memory
- Shared memory
- SMP system
- NUMA system (non-uniform memory access)
Accelerator
- GPU
- DSP
- AI engine
ParallelComputing (Software)
- Data parallel
- Vectorization
- Task parallel
- Multi-Threading
- Pipeline Optimization
Processof parallelism
1. Analyze the dependency withinthe data structures or within the processes, etc, in order to decide whichsections can be executed in parallel.
2. Decide on the best algorithmto execute the code over multiple processors
3. Rewrite code using frameworkssuch as Message Passing Interface (MPI), OpenMP, or OpenCL.
Parallelism Levela
1. Write parallel code using theoperating system's functions.
2. Use a parallelizationframework for program porting.
3. Use anautomatic-parallelization compiler.
Theory
- Taking the limit as N goes to infinity, the most speedup that can be achieved is S=1/y. This law is known as Amdahl's Law
From <https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/parallel-computing-software/>
- Gustafson states that of the changes in Ts, Tp is directly proportional to the program size, and that Tp grows faster than Ts.
From <https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/parallel-computing-software/>
- Programming Concepts
Creatingan OpenCL program requires writing codes for the host side(CPU), as well as forthe device side (GPU). The device is programmed in OpenCL C, as shown in List3.3 hello.cl. The host is programmed in C/C++ using an OpenCL runtime API(Running on CPU)
OpenCLis the open source version of CUDA.
We will now take a look atsoftware development in a heterogeneous environment.
First, we will take a look atCUDA, which is a way to write generic code to be run on the GPU. Since no OS isrunning on the GPU, the CPU must perform tasks such as code execution, filesystem management, and the user interface, so that the data parallelcomputations can be performed on the GPU.
In CUDA, the control managementside (CPU) is called the "host", and the data parallel side (GPU) iscalled the "device". The CPU side program is called the host program,and the GPU side program is called the kernel. The main difference between CUDAand the normal development process is that the kernel must be written in theCUDA language, which is an extension of the C language.
From <https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/historical-background/>
OpenCLmodel and Terminology
https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/applicable-platforms/
So what exactly is meant by"using OpenCL"? When developing software using OpenCL, the followingtwo tools are required:
- OpenCL Compiler
- OpenCL Runtime Library
From <https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/an-overview-of-opencl/>
OpenCL provides API for thefollowing programming models.
- Data parallel programming model
- Task parallel programming model
From <https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/applicable-platforms/>
The basic difference between the2 methods is as follows:
- Offline: Kernel binary is read in by the host code
- Online: Kernel source file is read in by the host code
From <https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/online-offline-compilation/>
- Programming flows:
Itis programming model is similar like OpenGL programming, need to program hostand device side (kernel) program separately. And load/compile/launch/managekernel program from host program. Difference of OpenCL to OpenGL is that it canget data from GPU to CPU. It is bi-directional data path.
https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/basic-program-flow/
- Get a list of available platforms
- Select device
- Create Context
- Create command queue
- Create memory objects
- Read kernel file
- Create program object
- Compile kernel
- Create kernel object
- Set kernel arguments
- Execute kernel (Enqueue task) ← hello() kernel function is called here
- Read memory object
- Free objects