CUDA:cudaMalloc vs cudaMallocHost

学习代码时,遇到了cudaMalloc 和 cudaMallocHosts 同时出现的情景,所以学习一下二者的区别。

参考资料1:cudaMallocHost函数详解

参考资料2:How to Optimize Data Transfers in CUDA C/C++   中文翻译:中文翻译

 

host内存:分为pageable memory 和 pinned memory

pageable memory: 通过操作系统API(malloc(),new())分配的存储器空间;

pinned memory     :始终存在于物理内存中,不会被分配到低速的虚拟内存中,能够通过DMA加速与设备端进行通信;

                                  cudaHostAlloc(), cudaFreeHost()来分配和释放pinned memory;

使用Malloc分配的内存都是Pageable(交换页)的,而另一个模式就是Pinned(Page-locked),实质是强制让系统在物理内存中完成内存申请和释放的工作,不参与页交换,从而提高系统效率,需要使用cudaHostAlloc和cudaFreeHost(cudaMallocHost的内存也这样释放)来分配和释放。

Pageable(交换页)与Pinned(Page-locked)都是“Write-back”,现在X86/X64CPU,会直接在内部使用一个特别的缓冲区,将写入合并,等没满64B(一个cache line),集中直接写入一次,越过所有的缓存,而读取的时候会直接从内存读取,同样无视各级缓存。

这种最大的用途可以用来在CUDA上准备输入数据,因为它在跨PCI-E传输的时候,可能会更快一些(因为不需要询问CPU的cache数据是否在里面)。

 

使用pinned memory优点:主机端-设备端的数据传输带宽高;某些设备上可以通过zero-copy功能映射到设备地址空间,从GPU直接访问,省掉主存与显存间进行数据拷贝的工作;

使用pinned memory缺点:pinned memory 不可以分配过多:导致操作系统用于分页的物理内存变少, 导致系统整体性能下降;通常由哪个cpu线程分配,就只有这个线程才有访问权限;

 

*************************************************************************************************************************************************

主机(CPU)数据分配的内存默认是可分页的GPU不能直接访问可分页的主机内存,所以当从可分页内存到设备内存的进行数据传输时,CUDA驱动必须首先分配一个临时的不可分页的或者固定的主机数组,然后将主机数据拷贝到固定数组里,最后再将数据从固定数组转移到设备内存,如下图所示:

pinned

 

正如你在图中所看到的那样,固定内存被用作数据传输的暂存区。我们可以通过直接分配固定内存的主机数组来避免这一开销。在CUDA C/C++中,我们可以使用cudaMallocHost()或者cudaHostAlloc()来分配固定内存,使用 cudaFreeHost()来释放内存。固定内存的分配有可能会失败,所以你应该总是检查错误。下面的代码片段演示了如何分配固定内存并进行错误检查。

cudaError_t status = cudaMallocHost((void**)&h_aPinned, bytes);
if (status != cudaSuccess)
  printf("Error allocating pinned host memoryn");

固定内存的数据传输和可分页内存一样,使用相同的cudaMemcpy()语法。我们可以使用下面的“bandwidthtest”(带宽测试)程序(同样可以在Github上找到)来对比可分页内存和固定内存的传输速度。

#include <stdio.h>
#include <assert.h>

// Convenience function for checking CUDA runtime API results
// can be wrapped around any runtime API call. No-op in release builds.
inline
cudaError_t checkCuda(cudaError_t result)
{
#if defined(DEBUG) || defined(_DEBUG)
  if (result != cudaSuccess) {
    fprintf(stderr, "CUDA Runtime Error: %sn",
            cudaGetErrorString(result));
    assert(result == cudaSuccess);
  }
#endif
  return result;
}

void profileCopies(float        *h_a,
                   float        *h_b,
                   float        *d,
                   unsigned int  n,
                   char         *desc)
{
  printf("n%s transfersn", desc);
  • 8
    点赞
  • 34
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
16 1.4 EventManagement RT ............................................................................................................................... 17 1.4.1 cudaEventCreate ............................................................................................................................ 18 1.4.2 cudaEventRecord ........................................................................................................................... 19 1.4.3 cudaEventQuery ............................................................................................................................. 20 1.4.4 cudaEventSynchronize ................................................................................................................... 21 1.4.5 cudaEventDestroy .......................................................................................................................... 22 1.4.6 cudaEventElapsedTime .................................................................................................................. 23 1.5 MemoryManagement RT ........................................................................................................................... 24 1.5.1 cudaMalloc .................................................................................................................................... 25 1.5.2 cudaMallocPitch ............................................................................................................................ 26 1.5.3 cudaFree ........................................................................................................................................ 27 1.5.4 cudaMallocArray ........................................................................................................................... 28 1.5.5 cudaFreeArray ............................................................................................................................... 29 1.5.6 cudaMallocHost ............................................................................................................................. 30 1.5.7 cudaFreeHost ................................................................................................................................. 31 1.5.8 cudaMemset ................................................................................................................................... 32 1.5.9 cudaMemset2D .............................................................................................................................. 33 ii 1.5.10 cudaMemcpy .............................................................................................................................. 34 1.5.11 cudaMemcpy2D ......................................................................................................................... 35 1.5.12 cudaMemcpyToArray ................................................................................................................. 36 1.5.13 cudaMemcpy2DToArray ............................................................................................................ 37 1.5.14 cudaMemcpyFromArray ............................................................................................................. 38 1.5.15 cudaMemcpy2DFromArray ........................................................................................................ 39 1.5.16 cudaMemcpyArrayToArray ........................................................................................................ 40 1.5.17 cudaMemcpy2DArrayToArray .................................................................................................... 41 1.5.18 cudaMemcpyToSymbol .............................................................................................................. 42 1.5.19 cudaMemcpyFromSymbol .......................................................................................................... 43 1.5.20 cudaGetSymbolAddress .............................................................................................................. 44 1.5.21 cudaGetSymbolSize .................................................................................................................... 45 1.5.22 cudaMalloc3D ............................................................................................................................ 46 1.5.23 cudaMalloc3DArray ................................................................................................................... 48 1.5.24 cudaMemset3D ........................................................................................................................... 50 1.5.25 cudaMemcpy3D ......................................................................................................................... 52 1.6 TextureReferenceManagement RT ........................................................................................................... 54 1.6.1 LowLevelApi ............................................................................................................................... 55 1.6.2 HighLevelApi ............................................................................................................................... 63 1.7 ExecutionControl RT ............................................................................................................................... 68 1.7.1 cudaConfigureCall ........................................................................................................................ 69 1.7.2 cudaLaunch .................................................................................................................................. 70 1.7.3 cudaSetupArgument ..................................................................................................................... 71 1.8 OpenGlInteroperability RT ...................................................................................................................... 72 1.8.1 cudaGLSetGLDevice .................................................................................................................... 73 1.8.2 cudaGLRegisterBufferObject ........................................................................................................ 74 1.8.3 cudaGLMapBufferObject ............................................................................................................. 75 1.8.4 cudaGLUnmapBufferObject ......................................................................................................... 76 1.8.5 cudaGLUnregisterBufferObject .................................................................................................... 77 1.9 Direct3dInteroperability RT ..................................................................................................................... 78 1.9.1 cudaD3D9SetDirect3DDevice ....................................................................................................... 79 1.9.2 cudaD3D9GetDirect3DDevice ...................................................................................................... 80 1.9.3 cudaD3D9RegisterResource ......................................................................................................... 81 1.9.4 cudaD3D9UnregisterResource ...................................................................................................... 83 v 1.9.5 cudaD3D9MapResources ...............................................................................................................84 1.9.6 cudaD3D9UnmapResources ...........................................................................................................85 1.9.7 cudaD3D9ResourceSetMapFlags ....................................................................................................86 1.9.8 cudaD3D9ResourceGetSurfaceDimensions .....................................................................................88 1.9.9 cudaD3D9ResourceGetMappedPointer ...........................................................................................89 1.9.10 cudaD3D9ResourceGetMappedSize .............................................................................................90 1.9.11 cudaD3D9ResourceGetMappedPitch ............................................................................................91 1.9.12 cudaD3D9Begin ...........................................................................................................................92 1.9.13 cudaD3D9End ..............................................................................................................................93 1.9.14 cudaD3D9RegisterVertexBuffer ...................................................................................................94 1.9.15 cudaD3D9MapVertexBuffer .........................................................................................................95 1.9.16 cudaD3D9UnmapVertexBuffer ....................................................................................................96 1.9.17 cudaD3D9UnregisterVertexBuffer ................................................................................................97 1.9.18 cudaD3D9GetDevice ....................................................................................................................98 1.10 ErrorHandling RT ...................................................................................................................................99 1.10.1 cudaGetLastError ....................................................................................................................... 100 1.10.2 cudaGetErrorString .................................................................................................................... 102 2 DriverApiReference 103 2.1 Initialization ............................................................................................................................................ 104 2.1.1 cuInit ........................................................................................................................................... 105 2.2 DeviceManagement ................................................................................................................................. 106 2.2.1 cuDeviceComputeCapability ........................................................................................................ 107 2.2.2 cuDeviceGet ................................................................................................................................ 108 2.2.3 cuDeviceGetAttribute ................................................................................................................... 109 2.2.4 cuDeviceGetCount ....................................................................................................................... 111 2.2.5 cuDeviceGetName ....................................................................................................................... 112 2.2.6 cuDeviceGetProperties ................................................................................................................. 113 2.2.7 cuDeviceTotalMem ...................................................................................................................... 115 2.3 ContextManagement ............................................................................................................................... 116 2.3.1 cuCtxAttach ................................................................................................................................. 117 2.3.2 cuCtxCreate ................................................................................................................................. 118 2.3.3 cuCtxDetach ................................................................................................................................ 120 2.3.4 cuCtxGetDevice ........................................................................................................................... 121 v 2.3.5 cuCtxPopCurrent ......................................................................................................................... 122 2.3.6 cuCtxPushCurrent ........................................................................................................................ 123 2.3.7 cuCtxSynchronize ........................................................................................................................ 124 2.4 ModuleManagement ............................................................................................................................... 125 2.4.1 cuModuleGetFunction ................................................................................................................. 126 2.4.2 cuModuleGetGlobal .................................................................................................................... 127 2.4.3 cuModuleGetTexRef ................................................................................................................... 128 2.4.4 cuModuleLoad ............................................................................................................................ 129 2.4.5 cuModuleLoadData ..................................................................................................................... 130 2.4.6 cuModuleLoadFatBinary ............................................................................................................. 131 2.4.7 cuModuleUnload ......................................................................................................................... 132 2.5 StreamManagement ................................................................................................................................ 133 2.5.1 cuStreamCreate ........................................................................................................................... 134 2.5.2 cuStreamDestroy ......................................................................................................................... 135 2.5.3 cuStreamQuery ............................................................................................................................ 136 2.5.4 cuStreamDestroy ......................................................................................................................... 137 2.6 EventManagement .................................................................................................................................. 138 2.6.1 cuEventCreate ............................................................................................................................. 139 2.6.2 cuEventDestroy ........................................................................................................................... 140 2.6.3 cuEventElapsedTime ................................................................................................................... 141 2.6.4 cuEventQuery .............................................................................................................................. 142 2.6.5 cuEventRecord ............................................................................................................................ 143 2.6.6 cuEventSynchronize .................................................................................................................... 144 2.7 ExecutionControl .................................................................................................................................... 145 2.7.1 cuLaunch ..................................................................................................................................... 146 2.7.2 cuLaunchGrid .............................................................................................................................. 147 2.7.3 cuParamSetSize ........................................................................................................................... 148 2.7.4 cuParamSetTexRef ...................................................................................................................... 149 2.7.5 cuParamSetf ................................................................................................................................ 150 2.7.6 cuParamSeti ................................................................................................................................ 151 2.7.7 cuParamSetv ................................................................................................................................ 152 2.7.8 cuFuncSetBlockShape ................................................................................................................. 153 2.7.9 cuFuncSetSharedSize ................................................................................................................... 154 2.8 MemoryManagement .............................................................................................................................. 155 v 2.8.1 cuArrayCreate ............................................................................................................................. 156 2.8.2 cuArrayDestroy ........................................................................................................................... 158 2.8.3 cuArrayGetDescriptor .................................................................................................................. 159 2.8.4 cuMemAlloc ................................................................................................................................ 160 2.8.5 cuMemAllocHost ........................................................................................................................ 161 2.8.6 cuMemAllocPitch ........................................................................................................................ 162 2.8.7 cuMemFree ................................................................................................................................. 164 2.8.8 cuMemFreeHost .......................................................................................................................... 165 2.8.9 cuMemGetAddressRange ............................................................................................................ 166 2.8.10 cuMemGetInfo .......................................................................................................................... 167 2.8.11 cuMemcpy2D ............................................................................................................................ 168 2.8.12 cuMemcpy3D ............................................................................................................................ 171 2.8.13 cuMemcpyAtoA ........................................................................................................................ 174 2.8.14 cuMemcpyAtoD ........................................................................................................................ 175 2.8.15 cuMemcpyAtoH ........................................................................................................................ 176 2.8.16 cuMemcpyDtoA ........................................................................................................................ 177 2.8.17 cuMemcpyDtoD ........................................................................................................................ 178 2.8.18 cuMemcpyDtoH ........................................................................................................................ 179 2.8.19 cuMemcpyHtoA ........................................................................................................................ 180 2.8.20 cuMemcpyHtoD ........................................................................................................................ 181 2.8.21 cuMemset .................................................................................................................................. 182 2.8.22 cuMemset2D ............................................................................................................................. 183 2.9 TextureReferenceManagement ................................................................................................................ 184 2.9.1 cuTexRefCreate ........................................................................................................................... 185 2.9.2 cuTexRefDestroy ......................................................................................................................... 186 2.9.3 cuTexRefGetAddress ................................................................................................................... 187 2.9.4 cuTexRefGetAddressMode .......................................................................................................... 188 2.9.5 cuTexRefGetArray ...................................................................................................................... 189 2.9.6 cuTexRefGetFilterMode .............................................................................................................. 190 2.9.7 cuTexRefGetFlags ....................................................................................................................... 191 2.9.8 cuTexRefGetFormat .................................................................................................................... 192 2.9.9 cuTexRefSetAddress ................................................................................................................... 193 2.9.10 cuTexRefSetAddressMode ......................................................................................................... 194 2.9.11 cuTexRefSetArray ..................................................................................................................... 195 vii 2.9.12 cuTexRefSetFilterMode ............................................................................................................. 196 2.9.13 cuTexRefSetFlags ...................................................................................................................... 197 2.9.14 cuTexRefSetFormat .................................................................................................................... 198 2.10 OpenGlInteroperability .......................................................................................................................... 199 2.10.1 cuGLCtxCreate .......................................................................................................................... 200 2.10.2 cuGLInit .................................................................................................................................... 201 2.10.3 cuGLMapBufferObject ............................................................................................................... 202 2.10.4 cuGLRegisterBufferObject ......................................................................................................... 203 2.10.5 cuGLUnmapBufferObject .......................................................................................................... 204 2.10.6 cuGLUnregisterBufferObject ...................................................................................................... 205 2.11 Direct3dInteroperability ........................................................................................................................ 206 2.11.1 cuD3D9CtxCreate ...................................................................................................................... 207 2.11.2 cuD3D9GetDirect3DDevice ....................................................................................................... 208 2.11.3 cuD3D9RegisterResource ........................................................................................................... 209 2.11.4 cuD3D9UnregisterResource ....................................................................................................... 211 2.11.5 cuD3D9MapResources ............................................................................................................... 212 2.11.6 cuD3D9UnmapResources ........................................................................................................... 213 2.11.7 cuD3D9ResourceSetMapFlags ................................................................................................... 214 2.11.8 cuD3D9ResourceGetSurfaceDimensions .................................................................................... 215 2.11.9 cuD3D9ResourceGetMappedPointer .......................................................................................... 216 2.11.10 cuD3D9ResourceGetMappedSize ............................................................................................. 217 2.11.11 cuD3D9ResourceGetMappedPitch ............................................................................................ 218 2.11.12cuD3D9Begin ........................................................................................................................... 219 2.11.13cuD3D9End .............................................................................................................................. 220 2.11.14cuD3D9GetDevice .................................................................................................................... 221 2.11.15cuD3D9MapVertexBuffer ......................................................................................................... 222 2.11.16 cuD3D9RegisterVertexBuffer ................................................................................................... 223 2.11.17 cuD3D9UnmapVertexBuffer .................................................................................................... 224 2.11.18 cuD3D9UnregisterVertexBuffer ............................................................................................... 225 3 AtomicFunctions 226 3.1 ArithmeticFunctions ................................................................................................................................ 227 3.1.1 atomicAdd ................................................................................................................................... 228 3.1.2 atomicSub .................................................................................................................................... 229 vii 3.1.3 atomicExch ................................................................................................................................. 230 3.1.4 atomicMin ................................................................................................................................... 231 3.1.5 atomicMax .................................................................................................................................. 232 3.1.6 atomicInc .................................................................................................................................... 233 3.1.7 atomicDec ................................................................................................................................... 234 3.1.8 atomicCAS .................................................................................................................................. 235 3.2 BitwiseFunctions .................................................................................................................................... 236 3.2.1 atomicAnd ................................................................................................................................... 237 3.2.2 atomicOr ..................................................................................................................................... 238 3.2.3 atomicXor ...............................................................................................................................
cudaMalloccudaMallocHostCUDA中两个用于内存分配的函数。 cudaMalloc用于在设备上分配内存,它将返回一个指向分配内存的指针。通过cudaMalloc分配的内存在设备上是可读写的。 cudaMallocHost用于在主机上分配固定内存(pinned memory),它将返回一个指向分配内存的指针。通过cudaMallocHost分配的内存可以在主机和设备之间进行高速的数据传输。 引用中的代码示例展示了如何使用cudaMallocHost分配内存,并使用cudaMemcpyAsync在不同的设备上并行地传输数据。在这个示例中,通过cudaMallocHost分配的内存h1和h2可以同时被两个设备d1和d2访问并进行数据传输。 引用提到,固定内存被用作数据传输的暂存区,我们可以使用cudaMallocHostcudaHostAlloc来分配固定内存,并使用cudaFreeHost释放内存。需要注意的是,固定内存的分配有可能失败,所以在分配固定内存时应该进行错误检查。 综上所述,cudaMalloc用于在设备上分配内存,而cudaMallocHost用于在主机上分配固定内存,用于高速的主机和设备之间的数据传输。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *3* [CUDA:cudaMalloc vs cudaMallocHost](https://blog.csdn.net/lilai619/article/details/109199235)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] - *2* [cuda中的cudaMallocHost](https://blog.csdn.net/adream307/article/details/89879479)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

微风❤水墨

你的鼓励是我最大的动力!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值