We were able optimize the stencil calculation by putting the entire application in the high bandwidth memory. This was only possible because we needed less than 16 gigabytes of RAM. If you need more than 16 gigabytes, then you might want to look into the mem kind library, which allows you to selectively put some buffers into a high bandwidth memory. So for our experiment, we are going to roll back the changes that we made to the Makefile. And instead go to image.cc, this is where we allocate the memory buffers to store the image data. You can see right here, in the first allocator, that we are using mm_malloc to allocate a pointer based array pixel. Instead of this line, lets use the high bandwidth allocator, hpw_posix_memalign. Its syntax asks that the first argument Is the memory address of the pointer that you're trying to allocate. The second argument is the alignment value, 64 bytes, and the third element is the size of the buffer. If you search carefully for MM, you will see that there is another constructor that also uses an allocator, and we are going to do the same replacement in that second constructor. And if we search for MM again we will find a d allocator. We have to use the corresponding high bandwidth memory d allocator. Here. So let's see if it looks correct. Let's try to compile. When we try to compile, we get a compilation error. Identify hbw_posix_mmalign is undefined, well that is because we failed to include the image file that has the signature of that function. Include hbwmalloc should resolve this problem. Does it work now? Still an error but a different one. Undefined references to hpw_posix_mmalig, this is because at link time we are not telling the linker to search for the memkind library. To fix this, we have to edit the Makefile again. In the link line, in addition to linking the png library, you will link memkind. That's that. Now compilation and linking is successful, and let's submit the job for execution. So what do the results look like now? Now, we are also observing nearly 220 GB/s of bandwidth. Which means that our buffers successfully went to the high bandwidth memory. But we didn't have to put the entire application into we only allocated the buffers that contained bandwidth critical data.
memkind版本查看_Demo: Stencil with Memkind
最新推荐文章于 2022-11-14 15:48:49 发布