NVIDIA Deep Learning SDK（NVIDIA深度学习相关的各种库）

最新推荐文章于 2025-03-29 09:00:00 发布

smartcat2010

最新推荐文章于 2025-03-29 09:00:00 发布

阅读量996

点赞数

分类专栏： GPU

原文链接：https://developer.nvidia.com/deep-learning-software

版权

GPU 专栏收录该内容

19 篇文章

订阅专栏

Mixed precision in AI frameworks (Automatic Mixed Precision): 混合精度计算，最高3倍加速比，利用Tensor Cores；（Get upto 3X speedup running on Tensor Cores With just a few lines of code added to your existing training script）
Deep Learning Primitives (cuDNN): 深度学习GPU加速的标配；（High-performance building blocks for deep neural network applications including convolutions, activation functions, and tensor transformations）
Input Data Processing (DALI): 并行度高的数据加载和数据增强库（主要针对图像、视频）；（An open source data loading and augmentation library that is fast, portable and flexible）
Multi-GPU Communication (NCCL): 组播通信神器，double-tree实现；（Collective communication routines, such as all-gather, reduce, and broadcast that accelerate multi-GPU deep learning training）
Deep Learning Inference Engine (TensorRT): 推理神器；（High-performance deep learning inference runtime for production deployment）（TensorFlow-to-ONNX-to-TensorRT例子）
Deep Learning for Video Analytics (DeepStream SDK): High-level C++ API and runtime for GPU-accelerated transcoding and deep learning inference
Optical Flow for Video Inference (Optical Flow SDK): Set of high-level APIs that expose the latest hardware capability of Turing GPUs dedicated for computing the optical flow of pixels between images. Also useful for calculating stereo disparity and depth estimation.
High level SDK for tuning domain specific DNNs (Transfer Learning Toolkit): 迁移学习；（Enabling end to end Deep Learning workflows for industries）
AI enabled Annotation for Medical Imaging (AI-Assisted Annotation SDK): 没权限打开？？；（AI-assisted annotation for medical imaging related data labeling）
Deep Learning GPU Training System (DIGITS): 网页版的数据集、模型、训练可视化工具（和TensorGou很像），在计算框架等核心组件外围包的一层可视化而已；（Rapidly train highly accurate deep neural network (DNNs) for image classification, segmentation and object detection tasks）
Linear Algebra (cuBLAS): GPU矩阵计算标配；（GPU-accelerated BLAS functionality that delivers 6x to 17x faster performance than CPU-only BLAS libraries）
Sparse Matrix Operations (cuSPARSE): 稀疏矩阵计算标配（模型权重剪枝那里真用到了）；（GPU-accelerated linear algebra subroutines for sparse matrices that deliver up to 8x faster performance than CPU BLAS (MKL), ideal for applications such as natural language processing）

博客等级

码龄13年

256
原创

487
点赞

833
收藏

326
粉丝

关注

私信

热门文章

分类专栏

最新评论

NCCL的Double Binary Tree实现原理
nuttee: 有个问题，在计算单二叉树耗时的时候，非叶子节点需要接收其子节点的数据，虽然数据量是2S，但我理解从两个子节点接收数据是可以并行的吧，那耗时依然是 S/B而非你说的2S/B？
NCCL的Double Binary Tree实现原理
yangyangv5: 博主请教一些问题，ring allreduce在设备数量足够大的时候可以近似看所 2S/B+SC,这部分性能与设备无关了，和tree相比tree在性能上的优势是什么呢？RingAllReduce，每次每个节点等量的发送和接收，所以接收到的加和完后，没有带宽再同时发送了这段话不是很理解
Tensorflow论文解读
CSDN-Ada助手: 哇, 你的文章质量真不错，值得学习！不过这么高质量的文章, 还值得进一步提升, 以下的改进点你可以参考下: (1)提升标题与正文的相关性。
Mesos+Docker+Tensorflow集群解决方案
CSDN-Ada助手: 有人说无监督学习才是AI的未来，否则会有多少人工就有多少智能，元芳你怎么看呢？
B树和Clustered/Non-Clustered index
CSDN-Ada助手: 哇, 你的文章质量真不错，值得学习！不过这么高质量的文章, 还值得进一步提升, 以下的改进点你可以参考下: (1)增加条理清晰的目录；(2)提升标题与正文的相关性；(3)增加除了各种控件外，文章正文的字数。

大家在看

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。