tensorflow自定义GPU版本op节点

最新推荐文章于 2024-08-31 11:36:40 发布

银联蛋蛋

最新推荐文章于 2024-08-31 11:36:40 发布

阅读量3.4k

点赞数

文章标签： tensorflow op

本文链接：https://blog.csdn.net/qq_27637315/article/details/79114633

版权

本文介绍如何在TensorFlow中创建自定义的GPU运算符。由于标准库中没有所需的损失函数，通过编写CUDA kernel并编译为Op，实现了将输入张量数值加一的功能。详细步骤包括编写kernel代码、编译、测试及使用。

摘要由CSDN通过智能技术生成

由于前段时间导师布置了一个任务，要修改损失函数，但是这个损失函数在tensorflow自带的库中又没有，想了很多办法，试来试去找不到一个解决方案，因为tensorflow是把框架和数据分开的，所以直接用python写出来的函数是不能用的，只能定义一个节点来调用才行，所以就自然想到先跑一个gpu版本的kernel例程啦，网上cpu版本的教程很多，但是gpu版本的却比较的少，官网的教程极课学院有讲，但我觉得讲的太复杂，反正我是看了一遍没看懂，好了，开始正文。本次例程实现的是将输入tensor中的数字加一输出。

步骤1:写一个kernel

文件名：cuda_op_kernel.cu.cc 代码如下：

#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
 
__global__ void AddOneKernel(const int* in, const int N, int* out) {
  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N;
       i += blockDim.x * gridDim.x) {
    out[i] = in[i] + 1;
  }
}
 
void AddOneKernelLauncher(const int* in, const int N, int* out) {
  AddOneKernel<<<32, 256>>>(in, N, out);
}
 
#endif

步骤2:编写cpp程序

文件名：cuda_op_kernel.cc 代码如下：

#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
 
using namespace tensorflow;
 
REGISTER_OP("AddOne")
    .Input("input: int32")
    .Output("output: int32")
    .Doc(R"doc(
Adds 1 to all elements of the tensor.
output: A Tensor.
  output = input + 1
)doc");
 
void AddOneKernelLauncher(const int* in, const int N, int* out);
 
class AddOneOp : public OpKernel {
 public:
  explicit AddOneOp(OpKernelConstruction* context) : OpKernel(context) {}
 
  void Compute(OpKernelContext* context) override {
    // Grab the input tensor
    const Tensor& input_tensor = context->input(0);
    auto input = input_tensor.flat<int32>();
 
    // Create an output tensor
    Tensor* output_tensor = NULL;
    OP_