First Chapter: introducing CUDA and getting started with CUDA

Parallel processing

  the main reason behind clock speed being constant is high power dissipation with high clock rate. small trasistors packed in a small area and working at high speed will dissipate large power, and hence it is very difficult to keep the processor cool.
  As clock speed is getting satuated in terms of development, we need a new computing paradigm to increase the performance of the processors.
  GPU has many small and simple processors that can get work done in parallel.

Introducing GPU architecture and CUDA

  GPUs are used in many applications other than rendering graphic, these kinds of GPUs are called General-Purpose GPUS(GPGPUs).
  A GPU has simple control hardware and more hardware for data computaion. this structure makes it more power-efficient. the disadvantage is that it has a more restrictive programming model.
  CUDA provide an easy and efficient way of interacting with the GPUs.
  The performance of any hardware architecture is measured in terms of latency and throughput.
    1. Latency is the time taken to complete a given task;
    2. Throughput is the amount of the task completed in a given time;

  CPUs are designed to execute all instructions in the minimum time;
  GPUs are designed to execute more instructions in a given time.
  we don’t mind a delay in the processing of a single pixel. what we want is that more pixels are processed in the same time;

CUDA architecture

  CUDA architecture includes the unified shedder pipeline(统一着色器管道)which allows all arthmetic logical units(ALUs) present on a GPU chip to be marshaled by a single CUDA program. the instruction set is also tailored to general purpose computation and not specific to pixel computations. it also allows arbitrary read and write access to memory.
  All GPUs have many parallel processing units called cores.
  on the hardware side, these cores are divided into steaming processors and streaming multiprocessors.
  on the software side, a CUDA program is executed as a series of multiple threads running in parallel. Each thread is executed on a different core. The GPU can be viewed as a combination of many threads. Each block is bound to a different streaming multiprocessor on the GPU.
  The threads from same block can communicate with one another. The GPU has hierarchical (分层的) memory structure that inside one block and multiple blocks.
  We will call a CPU and its memory the host and a GPU and its memory a device;
  the host code is compiled on CPU by a normal C or C++ compiler;
  the device code is compiled on the GPU by a GPU compiler.
  Before lauching threads, the host copies data from the host memory to the device memory. The threrad works on data from device memory and stores the result on the device memory. Finally, this data is copied back to the host memory for further processing.

  The steps to develop a CUDA C program are as follows:
    1. Allocate memory for data in the host and device memory;
    2. Copy data from the host memory to the device;
    3. Lauch a kernel by specifying the degree of parallism;(指明kernel函数调用的并行thread数目);
    4. After all the threads are finished, copy the data back from the device memory to the host memory;
    5. Free up all memory used on the host and the device.

CUDA applications

computer vision applicaition

  With the CUDA acceleration of those algorithm, applications such as iamge segmentatiton, object detection, and classification can achieve a real-time frame rate performance of more than 30 frames per second. the medical imaging field is seeing widespread use of GPUs and CUDA in reconstruction and the processing of MRI images and Computed Tomography images.

DeviceQuery program

The result of Device Query program as follows:在这里插入图片描述在这里插入图片描述

A basic program in CUDA C

  '__ global __ ’ is a qualifier added by CUDA C to standard C. it tells the compiler that the function definition that follows this qualilfier should be compiled to run on a device, rather than a host.
  ‘kernel<<<1, 1>>>’ is a kernel call. the values inside the angular brackets indicate arguments we want to pass from the host to the device. Basically, it indicates the number of blocks and the number of threads that will run in parallel on the device.
  ‘kernel<<<1, 1>>>’ means that kernel function will run on one block and one thread on device;

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
智慧校园整体解决方案是响应国家教育信息化政策,结合教育改革和技术创新的产物。该方案以物联网、大数据、人工智能和移动互联技术为基础,旨在打造一个安全、高效、互动且环保的教育环境。方案强调从数字化校园向智慧校园的转变,通过自动数据采集、智能分析和按需服务,实现校园业务的智能化管理。 方案的总体设计原则包括应用至上、分层设计和互联互通,确保系统能够满足不同用户角色的需求,并实现数据和资源的整合与共享。框架设计涵盖了校园安全、管理、教学、环境等多个方面,构建了一个全面的校园应用生态系统。这包括智慧安全系统、校园身份识别、智能排课及选课系统、智慧学习系统、精品录播教室方案等,以支持个性化学习和教学评估。 建设内容突出了智慧安全和智慧管理的重要性。智慧安全管理通过分布式录播系统和紧急预案一键启动功能,增强校园安全预警和事件响应能力。智慧管理系统则利用物联网技术,实现人员和设备的智能管理,提高校园运营效率。 智慧教学部分,方案提供了智慧学习系统和精品录播教室方案,支持专业级学习硬件和智能化网络管理,促进个性化学习和教学资源的高效利用。同时,教学质量评估中心和资源应用平台的建设,旨在提升教学评估的科学性和教育资源的共享性。 智慧环境建设则侧重于基于物联网的设备管理,通过智慧教室管理系统实现教室环境的智能控制和能效管理,打造绿色、节能的校园环境。电子班牌和校园信息发布系统的建设,将作为智慧校园的核心和入口,提供教务、一卡通、图书馆等系统的集成信息。 总体而言,智慧校园整体解决方案通过集成先进技术,不仅提升了校园的信息化水平,而且优化了教学和管理流程,为学生、教师和家长提供了更加便捷、个性化的教育体验。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Dongz__

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值