1、NVCC Identification Macro
预定义的宏,并且给出NVCC的版本号
nvcc predefines the following macros:
NVCC
Defined when compiling C/C++/CUDA source files.
CUDACC
Defined when compiling CUDA source files.
CUDACC_RDC
Defined when compiling CUDA sources files in relocatable device code mode (see
NVCC Options for Separate Compilation).
CUDACC_VER_MAJOR
Defined with the major version number of nvcc .
CUDACC_VER_MINOR
Defined with the minor version number of nvcc .
CUDACC_VER_BUILD
Defined with the build version number of nvcc .
CUDACC_VER
Defined with the full version number of nvcc , represented as
CUDACC_VER_MAJOR * 10000 + CUDACC_VER_MINOR * 100 +
CUDACC_VER_BUILD .
#ifdef __NVCC__
#ifdef __CUDACC__
#ifndef __CUDACC_RDC__
std::cout<< "__CUDACC_VER_MAJOR__ " << __CUDACC_VER_MAJOR__ << std::endl;
std::cout<< "__CUDACC_VER_MINOR__ " << __CUDACC_VER_MINOR__ << std::endl;
std::cout<< "__CUDACC_VER_BUILD__ " << __CUDACC_VER_BUILD__ << std::endl;
std::cout<< "__CUDACC_VER__ " << __CUDACC_VER__ << std::endl;
#endif
#endif
#endif
使用这个在我电脑运行结果为
__CUDACC_VER_MAJOR__ 7
__CUDACC_VER_MINOR__ 5
__CUDACC_VER_BUILD__ 17
__CUDACC_VER__ 70517
2、command option types and notation
Each nvcc option has long name and short name, which are interchangeable with each other. Long name with two hyphens while short name with one hyphen.
Nvcc have three types command options: boolean options, single value options and list options.
- boolean options do not have argument. They are either specified on a command line or not.
- single value options must be specified at most once.
- list options may be specified more than once.
3、NVCC Options for Separate Compilation
example:
b.h
#define N 8
extern __device__ int g[N];
extern __device__ void bar(void);
b.cu
#include "b.h"
__device__ int g[N];
__device__ void bar (void)
{
g[threadIdx.x]++;
}
c.h
void print_c();
c.cpp
#include "c.h"
#include <iostream>
void print_c(){
std::cout << "temp.app" << std::endl;
}
d.h
int temp ();
d.cu
#include <stdio.h>
#include "b.h"
#include "c.h"
#include "d.h"
__global__ void foo (void) {
__shared__ int a[N];
a[threadIdx.x] = threadIdx.x;
__syncthreads();
g[threadIdx.x] = a[blockDim.x - threadIdx.x - 1];
bar();
}
int temp (void) {
print_c();
unsigned int i;
int *dg, hg[N];
int sum = 0;
foo<<<1, N>>>();
if(cudaGetSymbolAddress((void**)&dg, g)){
printf("couldn't get the symbol addr\n");
return 1;
}
if(cudaMemcpy(hg, dg, N * sizeof(int), cudaMemcpyDeviceToHost)){
printf("couldn't memcpy\n");
return 1;
}
for (i = 0; i < N; i++) {
sum += hg[i];
}
if (sum == 36) {
printf("PASSED\n");
} else {
printf("FAILED (%d)\n", sum);
}
}
e.cpp
#include "d.h"
int main(){
temp ();
}
makefile
all:
nvcc --gpu-architecture=sm_20 --device-c b.cu d.cu
nvcc --gpu-architecture=sm_20 --device-link b.o d.o --output-file link.o
g++ -c c.cpp -o c.o
g++ -c e.cpp -o e.o
g++ c.o e.o b.o link.o d.o -L/usr/local/cuda/lib64 -lcudart