A100 显卡关键参数

全局视图

首先看top图,
GA100是无损卡,就是说上面所有的unit都是好的,如下,
![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/18df1f1b7dbd43a8b2b73499f502cd25.png在这里插入图片描述

A100包含有108个SM,每个SM最大可以容纳1024个threads.
说白了就是GA100的有缺陷的卡就为A100,撇去那个不能用的GPC
在这里插入图片描述

SM视图

下面是一个SM的视图:
在这里插入图片描述

在这里插入图片描述

算力

算力:
在这里插入图片描述

加工工艺

工艺采用的是7nm工艺:
在这里插入图片描述

关键参数

在这里插入图片描述
在这里插入图片描述

实际测量参数

下面是关键参数:

device properties : 
	name : NVIDIA A100-PCIE-40GB
	totalGlobalMem : 42298834944
	sharedMemPerBlock : 49152
	regsPerBlock : 65536
	warpSize : 32
	memPitch : 2147483647
	maxThreadsPerBlock : 1024
	maxThreadsDim[0] : 1024
	maxThreadsDim[1] : 1024
	maxThreadsDim[2] : 64
	maxGridSize[0] : 2147483647
	maxGridSize[1] : 65535
	maxGridSize[2] : 65535
	clockRate : 1410000
	totalConstMem : 65536
	major : 8
	minor : 0
	textureAlignment : 512
	texturePitchAlignment : 32
	deviceOverlap : 1
	multiProcessorCount : 108
	kernelExecTimeoutEnabled : 0
	integrated : 0
	canMapHostMemory : 1
	computeMode : 0
	concurrentKernels : 1
	ECCEnabled : 1
	pciBusID : 64
	pciDeviceID : 0
	pciDomainID : 0
	tccDriver : 0
	asyncEngineCount : 3
	unifiedAddressing : 1
	memoryClockRate : 1215000
	memoryBusWidth : 5120
	l2CacheSize : 41943040
	persistingL2CacheMaxSize : 31457280
	maxThreadsPerMultiProcessor : 2048
	streamPrioritiesSupported : 1
	globalL1CacheSupported : 1
	localL1CacheSupported : 1
	sharedMemPerMultiprocessor : 167936
	regsPerMultiprocessor : 65536
	managedMemory : 1
	isMultiGpuBoard : 0
	multiGpuBoardGroupID : 0
	singleToDoublePrecisionPerfRatio : 2
	pageableMemoryAccess : 0
	concurrentManagedAccess : 1
	computePreemptionSupported : 1
	canUseHostPointerForRegisteredMem : 1
	cooperativeLaunch : 1
	cooperativeMultiDeviceLaunch : 1
	pageableMemoryAccessUsesHostPageTables : 0
	directManagedMemAccessFromHost : 0
	accessPolicyMaxWindowSize : 134213632

device limit : 
	deviceLimitStackSize : 1024
	deviceLimitPrintfFifoSize : 7077888
	deviceLimitMallocHeapSize : 8388608
	deviceLimitDevRuntimeSyncDepth : 2
	deviceLimitDevRuntimePendingLaunchCount : 2048
	deviceLimitMaxL2FetchGranularity : 64
	deviceLimitPersistingL2CacheSize : 7864320

summary : 
	register total size : 6.75 MiB
	shared memory size per sm : 164.00 KiB
	shared memory total size : 17.30 MiB
	constant memory total size : 64.00 KiB
	level 2 cache total size : 40.00 MiB
	device memory total size : 39.39 GiB
	device memory bandwidth : 1.56 TB/s
	stack memory total size : 216.00 MiB

block 在SM上的分布

  • sm上是以block为单位进行分配的。
  • 先分配偶数标号的sm,接着再分配奇数标号的sm
  • <<<108,1024>>全部sm占满。
    在这里插入图片描述
grid_dimblock_dimsm0sm1sm2sm3sm4sm5sm6sm7sm8sm9sm10sm11sm12sm13sm14sm15sm16sm17sm18sm19sm20sm21sm22sm23sm24sm25sm26sm27sm28sm29sm30sm31sm32sm33sm34sm35sm36sm37sm38sm39sm40sm41sm42sm43sm44sm45sm46sm47sm48sm49sm50sm51sm52sm53sm54sm55sm56sm57sm58sm59sm60sm61sm62sm63sm64sm65sm66sm67sm68sm69sm70sm71sm72sm73sm74sm75sm76sm77sm78sm79sm80sm81sm82sm83sm84sm85sm86sm87sm88sm89sm90sm91sm92sm93sm94sm95sm96sm97sm98sm99sm100sm101sm102sm103sm104sm105sm106sm107
11100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
1323200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
1646400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
112812800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
125625600000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
151251200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
11024102400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
161101010101010101010101010101010100000000000000000000000000000000000000000000000000000000000000000000000000000
16323203203203203203203203203203203203203203203203200000000000000000000000000000000000000000000000000000000000000000000000000000
16646406406406406406406406406406406406406406406406400000000000000000000000000000000000000000000000000000000000000000000000000000
1612812801280128012801280128012801280128012801280128012801280128012800000000000000000000000000000000000000000000000000000000000000000000000000000
1625625602560256025602560256025602560256025602560256025602560256025600000000000000000000000000000000000000000000000000000000000000000000000000000
1651251205120512051205120512051205120512051205120512051205120512051200000000000000000000000000000000000000000000000000000000000000000000000000000
161024102401024010240102401024010240102401024010240102401024010240102401024010240102400000000000000000000000000000000000000000000000000000000000000000000000000000
321101010101010101010101010101010101010101010101010101010101010101000000000000000000000000000000000000000000000
323232032032032032032032032032032032032032032032032032032032032032032032032032032032032032032032032000000000000000000000000000000000000000000000
326464064064064064064064064064064064064064064064064064064064064064064064064064064064064064064064064000000000000000000000000000000000000000000000
321281280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128000000000000000000000000000000000000000000000
322562560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256000000000000000000000000000000000000000000000
325125120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512000000000000000000000000000000000000000000000
321024102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024000000000000000000000000000000000000000000000
641111111111111111111111010101010101010101010101010101010101010101010101010101010101010101010101010101010101010
64323232323232323232323232323232323232323232320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320
64646464646464646464646464646464646464646464640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640
6412812812812812812812812812812812812812812812812812812812812812812801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280
6425625625625625625625625625625625625625625625625625625625625625625602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560
6451251251251251251251251251251251251251251251251251251251251251251205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120
641024102410241024102410241024102410241024102410241024102410241024102410241024102410241024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240
1081111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
10832323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232
10864646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464
108128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128
108256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256
108512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512
1081024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024

A100上SM/TPC/GPC分组关系

注意,SM都是逻辑idx
在这里插入图片描述
具体做法:

//__cooperative__
__global__ void KERNEL_NAME(__TEST_CASE_NAME__)(int *id, uint64_t *clocks) {
  int threadInBlock = threadIdx.x + threadIdx.y * blockDim.x + threadIdx.z * blockDim.x * blockDim.y;
  int blockInGrid = blockIdx.x + blockIdx.y * gridDim.x + blockIdx.z * gridDim.x * gridDim.y;
  int oneBlockSize = blockDim.x * blockDim.y * blockDim.z;
  int tidx = threadInBlock + oneBlockSize * blockInGrid;

#pragma unroll
  for (int i = 0; i < SYNC_LOOP; i++) {
      __syncthreads();
  }

  uint64_t start = rt::Clock();

  id[tidx] = __mysmid();
  clocks[__mysmid()] = start;  
}

static void gpc_test_kernel(int grid_dim, int block_dim, uint32_t *h_id, uint64_t *h_clocks) {
  rt::Error_t err;
  Stream_t stream;
  uint32_t *d_id;
  CHECK_ERROR(rt::Malloc((void **)&d_id, sizeof(uint32_t) * grid_dim * block_dim));
  CHECK_ERROR(rt::Memset(d_id, 0, sizeof(uint32_t) * grid_dim * block_dim));


  uint64_t *d_clocks;
  CHECK_ERROR(rt::Malloc((void **)&d_clocks, sizeof(uint64_t) * grid_dim * block_dim));
  CHECK_ERROR(rt::Memset(d_clocks, 0, sizeof(uint64_t) * grid_dim * block_dim));

  CHECK_ERROR(rt::StreamCreate(&stream));

  // kernel function
  void *args[] = {(void *)&d_id, (void *)&d_clocks};
  err = rt::LaunchCooperativeKernel((const void *)(KERNEL_NAME(__TEST_CASE_NAME__)), grid_dim, block_dim, args, 0,
                                    stream);
  CHECK_ERROR(err);
  CHECK_ERROR(rt::GetLastError());

  CHECK_ERROR(rt::StreamSynchronize(stream));
  CHECK_ERROR(rt::DeviceSynchronize());
  CHECK_ERROR(rt::StreamSynchronize(stream));

  CHECK_ERROR(rt::Memcpy(h_id, d_id, sizeof(uint32_t) * 1 * grid_dim * block_dim, rt::MemcpyDeviceToHost));
  CHECK_ERROR(rt::Memcpy(h_clocks, d_clocks, sizeof(uint64_t) * 1 * grid_dim * block_dim, rt::MemcpyDeviceToHost));

  CHECK_ERROR(rt::StreamDestroy(stream));
  CHECK_ERROR(rt::Free(d_id));
  CHECK_ERROR(rt::Free(d_clocks));
}

int mainc(){
......
  rt::Error_t err;
  std::ofstream file2(std::string(test_name) + std::string("_gpc_sm_layout.csv"));


  err = rt::SetDevice(0);
  CHECK_ERROR(err);
  rt::DeviceProp device_prop;
  err = rt::GetDeviceProperties(&device_prop, 0);
  CHECK_ERROR(err);

  grid_dim=device_prop.multiProcessorCount;
  block_dim = 1;

  gpc_test_kernel(grid_dim, block_dim, id, h_clocks);
...
}
sm0sm1sm2sm3sm4sm5sm6sm7sm8sm9sm10sm11sm12sm13sm14sm15sm16sm17sm18sm19sm20sm21sm22sm23sm24sm25sm26sm27sm28sm29sm30sm31sm32sm33sm34sm35sm36sm37sm38sm39sm40sm41sm42sm43sm44sm45sm46sm47sm48sm49sm50sm51sm52sm53sm54sm55sm56sm57sm58sm59sm60sm61sm62sm63sm64sm65sm66sm67sm68sm69sm70sm71sm72sm73sm74sm75sm76sm77sm78sm79sm80sm81sm82sm83sm84sm85sm86sm87sm88sm89sm90sm91sm92sm93sm94sm95sm96sm97sm98sm99sm100sm101sm102sm103sm104sm105sm106sm107
111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
3.25863E+143.25863E+143.25447E+143.25447E+143.25654E+143.25654E+143.26144E+143.26144E+143.2569E+143.2569E+143.25881E+143.25881E+143.25386E+143.25386E+143.25863E+143.25863E+143.25447E+143.25447E+143.25654E+143.25654E+143.26144E+143.26144E+143.2569E+143.2569E+143.25881E+143.25881E+143.25386E+143.25386E+143.25863E+143.25863E+143.25447E+143.25447E+143.25654E+143.25654E+143.26144E+143.26144E+143.2569E+143.2569E+143.25881E+143.25881E+143.25386E+143.25386E+143.25863E+143.25863E+143.25447E+143.25447E+143.25654E+143.25654E+143.26144E+143.26144E+143.2569E+143.2569E+143.25881E+143.25881E+143.25386E+143.25386E+143.25863E+143.25863E+143.25447E+143.25447E+143.25654E+143.25654E+143.26144E+143.26144E+143.2569E+143.2569E+143.25881E+143.25881E+143.25386E+143.25386E+143.25863E+143.25863E+143.25447E+143.25447E+143.25654E+143.25654E+143.26144E+143.26144E+143.2569E+143.2569E+143.25881E+143.25881E+143.25386E+143.25386E+143.25863E+143.25863E+143.25447E+143.25447E+143.25654E+143.25654E+143.26144E+143.26144E+143.2569E+143.2569E+143.25881E+143.25881E+143.25386E+143.25386E+143.25863E+143.25863E+143.25447E+143.25447E+143.25654E+143.25654E+143.26144E+143.26144E+143.2569E+143.2569E+14

在这里插入图片描述
按照这个图就可以直到SM和GPC的划分了

计算能力

在这里插入图片描述

MIG

一个GPUA100,可以划分7个独立的Instance.
在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

参数列表

A100 一个block中最大可以容纳1024个threads
一个SM最大可以荣达2048个threads
所以最大可以容纳<216,1024>个线程
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值