深度学习常用包，库的功能及之间的关系

最新推荐文章于 2022-12-26 11:41:58 发布

zdcs

最新推荐文章于 2022-12-26 11:41:58 发布

阅读量1.3k

点赞数

分类专栏：深度学习一般技巧和资源介绍文章标签：人工智能 cuda 深度学习库

本文链接：https://blog.csdn.net/zdcs/article/details/53517008

版权

一般技巧和资源介绍同时被 2 个专栏收录

23 篇文章 0 订阅

订阅专栏

深度学习

19 篇文章 1 订阅

订阅专栏

新入行员工经常问及一些库，包的功能和之间的关系，这里做个笔记:

以下从最底层网上

CUDA/openCL

直接和硬件也就是nvidia的显卡打交道，我记得这里面有两种接口

一个是runtime API,简单但功能有限；

一个是 low-level CUDA driver API, 提供更细粒度的控制，自然编写代码更难些，需要对底层细节更加清除

区别主要在于:

Complexity vs. control

The runtime API eases device code management by providing implicit initialization, context management, and module management. This leads to simpler code, but it also lacks the level of control that the driver API has.

In comparison, the driver API offers more fine-grained control, especially over contexts and module loading. Kernel launches are much more complex to implement, as the execution configuration and kernel parameters must be specified with explicit function calls. However, unlike the runtime, where all the kernels are automatically loaded during initialization and stay loaded for as long as the program runs, with the driver API it is possible to only keep the modules that are currently needed loaded, or even dynamically reload modules. The driver API is also language-independent as it only deals with cubin objects.

Context management

Context management can be done through the driver API, but is not exposed in the runtime API. Instead, the runtime API decides itself which context to use for a thread: if a context has been made current to the calling thread through the driver API, the runtime will use that, but if there is no such context, it uses a "primary context." Primary contexts are created as needed, one per device per process, are reference-counted, and are then destroyed when there are no more references to them. Within one process, all users of the runtime API will share the primary context, unless a context has been made current to each thread. The context that the runtime uses, i.e, either the current context or primary context, can be synchronized with cudaDeviceSynchronize(), and destroyed with cudaDeviceReset().

Using the runtime API with primary contexts has its tradeoffs, however. It can cause trouble for users writing plug-ins for larger software packages, for example, because if all plug-ins run in the same process, they will all share a context but will likely have no way to communicate with each other. So, if one of them calls cudaDeviceReset() after finishing all its CUDA work, the other plug-ins will fail because the context they were using was destroyed without their knowledge. To avoid this issue, CUDA clients can use the driver API to create and set the current context, and then use the runtime API to work with it. However, contexts may consume significant resources, such as device memory, extra host threads, and performance costs of context switching on the device. This runtime-driver context sharing is important when using the driver API in conjunction with libraries built on the runtime API, such as cuBLAS or cuFFT.

Read more at: http://docs.nvidia.com/cuda/cuda-driver-api/index.html#ixzz4SDvcrQ7y
Follow us: @GPUComputing on Twitter | NVIDIA on Facebook

这是nvidia私有的，目前仅支持英伟达自己的显卡

CUDA使用了NVCC，一种对C语言进行了扩充的编译器

openCL公开协议，自己很多显卡甚至FPGA, 但目前效果不佳

cuDNN

英伟达在CUDA上开发的深度神经网络开发包

Tensorflow/Torch/theano ---都是'计算引擎'

Tensorflow用C++开发，接口提供给python，以后会支持很多其他语言，

有CPU,GPU版本，支持CUDA/cuDNN，openCL，可以跨多卡跨多机

Torch底层用C实现，只能用lua调用，只能跨单机的多卡，支持CUDA/cuDNN，openCL;似乎很多顶级paper用Torch实现

theano python写的，实际是有个'编译'过程转换为C代码，不方便调试

keras---python写的，封装了tensorflow和Theano，运行时可以选择两者之一作为backend；功能过于简单