cuda 本地内存使用_每个CUDA线程的本地内存量

I read in NVIDIA documentation (http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications, table #12) that the amount of local memory per thread is 512 Ko for my GPU (GTX 580, compute capability 2.0).

I tried unsuccessfully to check this limit on Linux with CUDA 6.5.

Here is the code I used (its only purpose is to test local memory limit, it doesn't make any usefull computation):

#include

#include

#define MEMSIZE 65000 // 65000 -> out of memory, 60000 -> ok

inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=false)

{

if (code != cudaSuccess)

{

fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);

if( abort )

exit(code);

}

}

inline void gpuCheckKernelExecutionError( const char *file, int line)

{

gpuAssert( cudaPeekAtLastError(), file, line);

gpuAssert( cudaDeviceSynchronize(), file, line);

}

__global__ void kernel_test_private(char *output)

{

int c = blockIdx.x*blockDim.x + threadIdx.x; // absolute col

int r = blockIdx.y*blockDim.y + threadIdx.y; // absolute row

char tmp[MEMSIZE];

for( int i = 0; i < MEMSIZE; i++)

tmp[i] = 4*r + c; // dummy computation in local mem

for( int i = 0; i < MEMSIZE; i++)

output[i] = tmp[i];

}

int main( void)

{

printf( "MEMSIZE=%d bytes.\n", MEMSIZE);

// allocate memory

char output[MEMSIZE];

char *gpuOutput;

cudaMalloc( (void**) &gpuOutput, MEMSIZE);

// run kernel

dim3 dimBlock( 1, 1);

dim3 dimGrid( 1, 1);

kernel_test_private<<>>(gpuOutput);

gpuCheckKernelExecutionError( __FILE__, __LINE__);

// transfer data from GPU memory to CPU memory

cudaMemcpy( output, gpuOutput, MEMSIZE, cudaMemcpyDeviceToHost);

// release resources

cudaFree(gpuOutput);

cudaDeviceReset();

return 0;

}

And the compilation command line:

nvcc -o cuda_test_private_memory -Xptxas -v -O2 --compiler-options -Wall cuda_test_private_memory.cu

The compilation is ok, and reports:

ptxas info : 0 bytes gmem

ptxas info : Compiling entry function '_Z19kernel_test_privatePc' for 'sm_20'

ptxas info : Function properties for _Z19kernel_test_privatePc

65000 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info : Used 21 registers, 40 bytes cmem[0]

I got an "out of memory" error at runtime on the GTX 580 when I reached 65000 bytes per thread. Here is the exact output of the program in the console:

MEMSIZE=65000 bytes.

GPUassert: out of memory cuda_test_private_memory.cu 48

I also did a test with a GTX 770 GPU (on Linux with CUDA 6.5). It ran without error for MEMSIZE=200000, but the "out of memory error" occurred at runtime for MEMSIZE=250000.

How to explain this behavior ? Am I doing something wrong ?

解决方案

It seems you are running into not a local memory limitation but a stack size limitation:

ptxas info : Function properties for _Z19kernel_test_privatePc

65000 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

The variable that you had intended to be local is on the (GPU thread) stack, in this case.

Based on the information provided by @njuffa here, the available stack size limit is the lesser of:

The maximum local memory size (512KB for cc2.x and higher)

GPU memory/(#of SMs)/(max threads per SM)

Clearly, the first limit is not the issue. I assume you have a "standard" GTX580, which has 1.5GB memory and 16 SMs. A cc2.x device has a maximum of 1536 resident threads per multiprocessor. This means we have 1536MB/16/1536 = 1MB/16 = 65536 bytes stack. There is some overhead and other memory usage that subtracts from the total available memory, so the stack size limit is some amount below 65536, somewhere between 60000 and 65000 in your case, apparently.

I suspect a similar calculation on your GTX770 would yield a similar result, i.e. a maximum stack size between 200000 and 250000.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值