CUDA − Compute Unified Device Architecture. It is an extension of C programming, an API model for parallel computing created by Nvidia. Programs written using CUDA harness the power of GPU. Thus, increasing the computing performance.
CUDA - 计算机统一设备接口(直翻), 他是一个C语言编程的拓展(超集), 是英伟达创建的一个为了并行计算而创建的API模型. 是一个使用CUDA驱动GPU, 从而增加计算性能的程序.
Parallelism in the CPU
CPU并行
Gordon Moore of Intel once famously stated a rule, which said that every passing year, the clock frequency of a semiconductor core doubles. This law held true until recent years. Now, as the clock frequencies of a single core reach saturation points (you will not find a single core CPU with a clock frequency of say, 5GHz, even after 2 years from now), the paradigm has shifted to multi-core and many-core processors.
英特尔的创始人高登摩尔曾经说过这么一句话: 每过一年, 半导体核心的始终频率将会翻倍, 这条"法则"在近几年之前一直被证明是对的, 而现在
In this chapter, we will study how parallelism is achieved in CPUs. This chapter is an essential foundation to studying GPUs (it helps in understanding the key differences between GPUs and CPUs).
Following are the five essential steps required for an instruction to finish −
Instruction fetch (IF)
Instruction decode (ID)
Instruction execute (Ex)
Memory access (Mem)
Register write-back (WB)
This is a basic five-stage RISC archite