NVIDIA A100 Customer Deck.pdf
英伟达A100产品资料,技术参数
A100:面积最大,性能最强
具体提升了多少?还记得三年前推出、至今仍然业界领先的 Volta 架构芯片 Tesla V100 吗?V100 用 300W 功率提供了 7.8TFLOPS 的推断算力,有 210 亿个晶体管,但 A100 的算力直接是前者的 20 倍。
「A100 是迄今为止人类制造出的最大 7 纳米制程芯片,」黄仁勋说道。A100 采用目前最先进的台积电(TSMC)7 纳米工艺,拥有 540 亿个晶体管,它是一块 3D 堆叠芯片,面积高达 826mm^2,GPU 的最大功率达到了 400W。
这块 GPU 上搭载了容量 40G 的三星 HBM2 显存(比 DDR5 速度还快得多,就是很贵),第三代 Tensor Core。同时它的并联效率也有了巨大提升,其采用带宽 600GB/s 的新版 NVLink,几乎达到了 10 倍 PCIE 互联速度。
NVIDIA GPU CUDA代码性能优化基础
Fundamental Optimizations in CUDA
Optimization Overview
GPU architecture
Kernel optimization
— Memory optimization
— Latency optimization
— Instruction optimization
CPU-GPU interaction optimization — Overlapped execution using streams
Nvidia 2020 安培架构GPU特性介绍
NVIDIA A100 Tensor Core GPU
Architecture
UNPRECEDENTED ACCELERATION AT EVERY SCALE
Introduction
The diversity of compute-intensive applications running in modern cloud data centers has driven the explosion of NVIDIA GPU-accelerated cloud computing. Such intensive applications include AI deep learning training and inference, data analytics, scientific computing, genomics, edge video analytics and 5G services, graphics rendering, cloud gaming, and many more. From scaling-up AI training and scientific computing, to scaling-out inference applications, to enabling real-time conversational AI, NVIDIA GPUs provide the necessary horsepower to accelerate numerous complex and unpredictable workloads running in today’s cloud data centers.
NVIDIA® GPUs are the leading computational engines powering the AI revolution, providing tremendous speedups for AI training and inference workloads. In addition, NVIDIA GPUs accelerate many types of HPC and data analytics applications and systems, allowing customers to effectively analyze, visualize, and turn data into insights. NVIDIA’s accelerated computing platforms are central to many of the world’s most important and fastest-growing industries.
HPC has grown beyond supercomputers running computationally-intensive applications such as weather forecasting, oil & gas exploration, and financial modeling. Today, millions of NVIDIA GPUs are accelerating many types of HPC applications running in cloud data centers, servers, systems at the edge, and even deskside workstations, servicing hundreds of industries and scientific domains.
AI networks continue to grow in size, complexity, and diversity, and the usage of AI-based applications and services is rapidly expanding. NVIDIA GPUs accelerate numerous AI systems and applications including: deep learning recommendation systems, autonomous machines (self-driving cars, factory robots, etc.), natural language processing (conversational AI, real-time language translation, etc.), smart city video analytics, software-defined 5G networks (that can deliver AI-based services at the Edge), molecular simulations, drone control, medical image analysis, and more.
使用NCCL进行多GPU训练(MULTI-GPU TRAINING WITH NCCL)
使用NCCL进行多GPU深度学习训练,其中涉及多机多卡,单机多卡等技术。
Optimized inter-GPU communication for DL and HPC Optimized for all NVIDIA platforms, most OEMs and Cloud Scales to 100s of GPUs, targeting 10,000s in the near future.
Aims at covering all communication needs for multi-GPU computing. Only relies on CUDA. No dependency on MPI or any parallel environment.
加州理工大学Python OpenGL教程
加州理工大学Python OpenGL教程,清楚简单,适合初学者,若有任何学术/技术上的以为,欢迎与我交流whitelok@163.com
Python OpenGL编程2009微软版
Python OpenGL编程2009微软英文版,包含大量开发实例,涵盖全部开发场景
操作系统原理(LINUX版-徐德民)
精简压缩的Linux版操作系统原理,适合笔试面试之用。
python实现的人脸识别
python 利用OpenCV for python实现的人脸识别文档,文档详细介绍了人脸识别的主要技术以及python实现人脸识别技术的代码。
如有任何技术上的疑问,欢迎发送邮件到whitelok@163.com与本人交流。
python opengl编程
OPENGL 的python实现,详细介绍了Python使用OPENGL的方法,并且书中有大量的代码方便实现。
如对书中有任何疑问,欢迎发邮件whitelok@163.com提问。
pcapy-python版抓包库
python 版的抓包库 winpcap libpcap完美移植到python上