caffe 同步与异步

最新推荐文章于 2024-06-15 18:46:15 发布

这题我会啊

最新推荐文章于 2024-06-15 18:46:15 发布

阅读量593

点赞数

分类专栏： caffe 文章标签： caffe cuda

本文链接：https://blog.csdn.net/Love_wanling/article/details/79914057

版权

caffe 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

主要是这两个遇到的问题，记录一下。

cpu_data与gpu_data

首先看到SyncedMemory的私有成员，有两个指针 $cpu\_ptr\_$ 和 $gpu\_ptr\_$ 。以及同步函数 $to\_cpu()$ $to\_gpu()$
这里写图片描述

cpu_data与gpu_data不是一直同步的，只有调用同步函数才会同步。搜索整个文件，发现只有如下match：
这里写图片描述

也就是说只有获取指针的时候，才会刷新一次数值，所以在一个函数开始把所有指针都获取，然后gpu_data给核函数操作，cpu_data再操作的方式是得不到正确结果的。

核函数内部同步

考虑情景：求一张Feature的平方和。可能会这样写(index = bottom_index)：

template <typename Dtype>
__global__ void Square(const int nthreads, const Dtype* bottom_data,
    const int width, const int height, Dtype* top_data) {
  CUDA_KERNEL_LOOP(index, nthreads) {
    Dtype sq = bottom_data[index]*bottom_data[index];
    top_data[index/(height*width)]=top_data[index/(height*width)] + sq;
  }
}

很简单但是结果会出错，只会得到一个数的平方。推测是因为同步执行。改成(index=top_index):

template <typename Dtype>
__global__ void Square(const int nthreads, const Dtype* bottom_data,
    const int width, const int height, Dtype* top_data) {
  CUDA_KERNEL_LOOP(index, nthreads) {
    for(int i = 0; i < width; ++i){
       for(int j = 0; j < height; ++j){
           Dtype sq = pow(bottom_data[index*width*height+i*width+j],2);
           top_data[index] = top_data[index] + temp;
       }
    }
  }
}