http://http.download.nvidia.com/developer/cuda/podcasts/Introduction_to_GPU_Computing.m4v
In this model, we provide the introduction to use the GPU or graphic process unit for non-graphic or computing operation. We briefly discuss the history parallay computing and how the GPU fit into the paradiam computing. Follow by the discussion, the CUDA, as a scalable program environment for GPU computing. We close this model with some examples where GPU computing has deliver substantical speed up in applications.
Parallel computing went through what may consider a golden age in the late 1980 and early 1990. In this time, there was a huge interesting on parallel computing, or more execitally, data paralle computing where the same compution are performed on different data elements with fine grenuality of paralliasm. Genually, individual elements of the rays. In this period, all kind of architecture were being explored and build, including the Cray XMP and YMP, CM1, CM2, and CM5 connection mechines, and MasPar MP1 and MP2. These mechines were ture super computers, every cents of the world. They were exotic, powerful, and expensive. Only a handful a preperly users around the world had access to them, mostly the national labs and major universitries. Despite the limited ability on such mechines, there were a great deal excitment about parallel computing, result in the well papers and article on the subject published at that time. Languages, programming models, and data parallel algorithms were developed, that solved a wide variety of problems, as well as the design of new computer architectures.
The impact from all the data parallel research wasn't some sence limited. The number of machines sold was relatively few as was the number of people who program them. What happened was the large expensive machines were displaced by clusters cheap PC such as Beowulf cluster. The reason for the shift is that the microprocessors had a condermi scale, which result them becaming fast faster than super computer company compliet with, as a result, parallel computing transation from with massively paraller processors to large cluster machines with micro processors. These distributed memory clusters have granularity of paraliasm as the other of end specturm from the fine granularity of massively parallel processors. Data is partition is large chance and distributed mount nodes of cluster. In distributed computing, single program multiple data or SPMD technics replaced single instruction multiple data or SIMD technics of massively parallel processors, and message passing is required to communicate between nodes. In the early 2000 the rete increase of microprocessor performance slowed dramatically. As a result, building more powerful cluster meant building large cluster which negatively affect scability, space, and power requirements. This is where the GPU anest picture.
The GPU, or graphic process unit, is the computer chip and computer video card and the game console. The GPU is a massively parallel device. It is designed to handle billion of pixels per second, million of polygans per second, and does this by imbreasing increable level a parallel reson. GPUs are manycore chip, meaning they contain handred processor cores as oppose to multicore which contain two four or eight cores as found in CPUs. GPUs are able to run tens of thousands of threads cocurrently, and have a peak performance of up to 1 tero flop. The granuality of paralisam on the GPU is fine. Antensends, represent return to the goden age of parallel computing. GPU can infact levguage many parallel algorithm to develop for those exaust super computers. users across many disciplines in science and engineering are achiving many full speed up on their code using GPUs.
While GPUs have contained computational horsepower to accelate non graphic application for some time, achieving signaficent speed up such application remain delosive due to the difficult programming these devices. One the centrely had the program non graphic appliation through graphic API, and therefore have to deal with many constrain compose by graphic API. The approve to bs substantial varier for most non graphic code. This all changed when CUDA was released in 2007. CUDA with constanfer compute unified Device Architecture, is parallel programming model that has been designed for scability. It is also a software environment that stantiate parallel computing model. It consist a small set of extension to the C programming language, which result low learning core. CUDA is a heterogeneous computing model, where the CPU and GPU are used portion of the application where thay are stongest. Serial portions of the application are run on the CPU, while parallel portions of applicaiton are off loaded to the GPU. This allowed incur model modification of CPU code to utilite the GPU. This features of CUDA can result in significent application speed up with relative leal effort. Nvidia telsa is a hardware that accelerate CUDA, while CUDA programs can run all shaping on all Nvidia geforce and many quadro products, the telsa card on recommend server are dedicate to exposing computational horsepower of Nvidia GPUs for GPU computing. While CUDA is design for manycore architectures, it also maps well to multicore CPUs. CUDA expresses paralleliasm as well and has the ability to target either the GPU or CPU.
Together, CUDA and GPU represent the moarquezation of parallel computing, by bring data-parallel computing to the masses. Today the install base CUDA capable GPUs is over eighty million, Nvidia sell one million GPU a week, which translate to a value one hundred GPU a minute. GPUs are also inexpensive, you can get a graphic card capable as well as over five hundred giga flops for under two handred dollors. The impact is data parallel computers are everywhere. Computing and massively parallel processor has gone from handful prection and handful national labs and universitires, something is truely you will be cuda this. And CUDA make it power execssable.
This slide show some examples of speedup abtain by porting applications to CUDA. These applications spend very diciplines. Including, medical imaging, computional flue dynamic of ice troble trublem perform to matlab, and aster pyhsics anybody problem, molacler dynamic im placement simulation, computional finance, and genomics. The following models were provided to view with the scale tools necessary to achieve such speedup using the CUDA programming model.