CUDA Handbook 补充 CPU和GPU硬件架构1

        CPU通过内存管理模块内存进行连接,如果内存不足,那么系统自动从硬盘划定出区域作为虚拟内存空间,读写速度很慢。

       北桥是Intel计算机主板芯片组的两枚芯片之一,另一枚是南桥,南桥连接外围设备到CPU。以前的GPU显卡在北桥,后来集成到了CPU内部。现在则基本上都是独立显卡,位于所谓的南桥区域,PCIe总线已经取代了北桥显卡设计,以前的PCI总线速度较慢,不太能满足数据交换的速率要求,所以新标准PCIe逐渐脱颖而出,PCIe有多种插槽模式:

                      为了方便介绍,图片截图自百度百科

        一看就知道x16速度肯定比x8快,x8比x4快。现在都是用x16来专门当做显卡插槽,毕竟它需要的内存带宽更高(所谓内存带宽高,就是同时间内传输的数据多)。而以前的AGP(加速图形接口)是设计在北桥区域中的,现在已经被淘汰了,因为工作频率比较低,速度不及PCIe快。

        再重复一下,以前的结构中,南桥处理一些例如音频芯片,键盘鼠标等外围设备,北桥连接CPU和GPU等主要设备。后来GPU因为高速PCIe总线移到了外面,而北桥也被集成到了CPU内部。北桥负责与南桥通信(比如处理接收到的鼠标信号),以及与PCIe的设备(如显卡)通信。当然显卡也可以被集成在CPU内部。

                

The CUDA Handbook begins where CUDA by Example (Addison-Wesley, 2011) leaves off, discussing CUDA hardware and software in greater detail and covering both CUDA 5.0 and Kepler. Every CUDA developer, from the casual to the most sophisticated, will find something here of interest and immediate usefulness. Newer CUDA developers will see how the hardware processes commands and how the driver checks progress; more experienced CUDA developers will appreciate the expert coverage of topics such as the driver API and context migration, as well as the guidance on how best to structure CPU/GPU data interchange and synchronization. The accompanying open source code-more than 25,000 lines of it, freely available at www.cudahandbook.com-is specifically intended to be reused and repurposed by developers. Designed to be both a comprehensive reference and a practical cookbook, the text is divided into the following three parts: Part I, Overview, gives high-level descriptions of the hardware and software that make CUDA possible. Part II, Details, provides thorough descriptions of every aspect of CUDA, including * Memory * Streams and events * Models of execution, including the dynamic parallelism feature, new with CUDA 5.0 and SM 3.5 * The streaming multiprocessors, including descriptions of all features through SM 3.5 * Programming multiple GPUs * Texturing The source code accompanying Part II is presented as reusable microbenchmarks and microdemos, designed to expose specific hardware characteristics or highlight specific use cases. Part III, Select Applications, details specific families of CUDA applications and key parallel algorithms, including * Streaming workloads * Reduction * Parallel prefix sum (Scan) * N-body * Image ProcessingThese algorithms cover the full range of potential CUDA applications.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Dezeming

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值