深度学习常用 包,库的功能及之间的关系

新入行员工经常问及一些库,包的功能和之间的关系,这里做个笔记:

以下从最底层网上

CUDA/openCL

直接和硬件也就是nvidia的显卡打交道,我记得这里面有两种接口

一个是runtime API,简单但功能有限;

一个是 low-level CUDA driver API, 提供更细粒度的控制,自然编写代码更难些,需要对底层细节更加清除

区别主要在于:

Complexity vs. control

The runtime API eases device code management by providing implicit initialization, context management, and module management. This leads to simpler code, but it also lacks the level of control that the driver API has.

In comparison, the driver API offers more fine-grained control, especially over contexts and module loading. Kernel launches are much more complex to implement, as the execution configuration and kernel parameters must be specified with explicit function calls. However, unlike the runtime, where all the kernels are automatically loaded during initialization and stay loaded for as long as the program runs, with the driver API it is possible to only keep the modules that are currently needed loaded, or even dynamically reload modules. The driver API is also language-independent as it only deals with cubin objects.

Context management

Context management can be done through the driver API, but is not exposed in the runtime API. Instead, the runtime API decides itself which context to use for a thread: if a context has been made current to the calling thread through the driver API, the runtime will use that, but if there is no such context, it uses a "primary context." Primary contexts are created as needed, one per device per process, are reference-counted, and are then destroyed when there are no more references to them. Within one process, all users of the runtime API will share the primary context, unless a context has been made current to each thread. The context that the runtime uses, i.e, either the current context or primary context, can be synchronized with cudaDeviceSynchronize(), and destroyed with cudaDeviceReset().

Using the runtime API with primary contexts has its tradeoffs, however. It can cause trouble for users writing plug-ins for larger software packages, for example, because if all plug-ins run in the same process, they will all share a context but will likely have no way to communicate with each other. So, if one of them calls cudaDeviceReset() after finishing all its CUDA work, the other plug-ins will fail because the context they were using was destroyed without their knowledge. To avoid this issue, CUDA clients can use the driver API to create and set the current context, and then use the runtime API to work with it. However, contexts may consume significant resources, such as device memory, extra host threads, and performance costs of context switching on the device. This runtime-driver context sharing is important when using the driver API in conjunction with libraries built on the runtime API, such as cuBLAS or cuFFT.



Read more at: http://docs.nvidia.com/cuda/cuda-driver-api/index.html#ixzz4SDvcrQ7y
Follow us: @GPUComputing on Twitter | NVIDIA on Facebook

这是nvidia私有的,目前仅支持英伟达自己的显卡

CUDA使用了NVCC,一种对C语言进行了扩充的编译器

openCL公开协议,自己很多显卡甚至FPGA, 但目前效果不佳


cuDNN

英伟达在CUDA上开发的深度神经网络开发包


Tensorflow/Torch/theano ---都是'计算引擎'

Tensorflow用C++开发,接口提供给python,以后会支持很多其他语言,

有CPU,GPU版本,支持CUDA/cuDNN,openCL,可以跨多卡跨多机

Torch底层用C实现,只能用lua调用,只能跨单机的多卡支持CUDA/cuDNN,openCL;似乎很多顶级paper用Torch实现

theano python写的,实际是有个'编译'过程转换为C代码,不方便调试



keras---python写的,封装了tensorflow和Theano,运行时可以选择两者之一作为backend;功能过于简单





评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值