focal loss dice loss源码_PyTorch 1.1.0 源码解析--运行机制-1

最新推荐文章于 2023-03-24 18:16:38 发布

weixin_39890332

最新推荐文章于 2023-03-24 18:16:38 发布

阅读量177

点赞数

文章标签： focal loss dice loss源码

啊。。。。时隔一年这个专栏又更新了，这一年发生了很多事情，而最近也是因为在找实习所以稍微空闲了一点了，距离PyTorch 1.1.0发布也已经过了一阵子了，自从pt1.0发布，pt的底层基本等于重构了，现在加入了caffe2作为后端，代码结构也更加科学了，所以，就有了这篇文章，再又一次翻阅了源码之后，先聊一些浅显的吧，从运行机制和过程开始聊起

从minst谈起

老规矩，我们继续从mnist开始

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4*4*50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

model = Net()
optimizer = optim.SGD(model.parameters(), lr=1e-6, momentum=0.5)
train_loader = []

model.train()
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)

上述代码可能诸位再熟悉不过了，那么让我们来看看在代码运行的时候到底发生了什么，以及pt是如何实现先forward再backward并保持调用链的

初始化(`nn.Module.init()`)

nn.Module.__init__()github.com

代码我就不帖出来了，这里给了链接，init中主要初始化了很多参数，比如buffer，hook等等，他们的作用我之后会讲到。根据我们的Net类的代码，它会依次初始化各个层，conv和fc的逻辑大同小异，只是多了一些独特的变量设置而已

真正的计算入口点(`nn.Module.call()`)

在执行到这一条语句之前，计算实际上是没有发生的

for hook in self._forward_pre_hooks.values(): hook(self, input)

这一行会在执行forward之前进行，处理预设的hook，关于hook的作用我们之后会详细说明，这里只是简要的概括一下：如果你需要看到/改变网络中的中间变量的时候，需要向register_*_hook()这个函数(*匹配backward forward pre_forwward等等)传入你自己的函数对象即可这里会扫描预处理的hook

if torch._C._get_tracing_state():
    result = self._slow_forward(*input, **kwargs)
else:
    result = self.forward(*input, **kwargs)

这个地方实现了在不写C代码的情况下直接执行forward，有一些自定义操作是没有C的，这里会直接调用python的版本。

这一步开始，调用了forward方法，首先会调用Net类的forward方法，然后会以此调用Conv2d的call()方法等，我们以Conv2d(变量conv1)为例：

我们注意到，在这个调用栈中，最终执行的也是nn.Module.__call__()的逻辑这个就是pytorch的设计的精妙之处了，所谓的原子性就来源于这里，所有的操作，都是同一个基函数实现的逻辑，不同的层只需要修改特有的逻辑就可以，根据调用栈的调用规则，实现一层一层的计算。

好的，这个时候我们执行到了Conv2d的forward方法了，而Conv2d的forward方法直接写在了torch._C下，所以我们找到了这里

Tensor Conv2dImpl::forward(const Tensor& input) {
  if (options.transposed_) {
    return torch::conv_transpose2d(
        input,
        weight,
        bias,
        options.stride_,
        options.padding_,
        options.output_padding_,
        options.groups_,
        options.dilation_);
  }
  return torch::conv2d(
      input,
      weight,
      bias,
      options.stride_,
      options.padding_,
      options.dilation_,
      options.groups_);

然而这依然是一个wrapper，这部分逻辑代码最终由aten/c10定义(关于这两个库的作用，参见专栏文章：

Gemfield：PyTorch ATen代码的动态生成zhuanlan.zhihu.com

)

最终由

CPU: legacy::cpu::_thnn_conv2d_forward
CUDA: legacy::cuda::_thnn_conv2d_forward

计算得到，我怀疑这里用的是之前的cpp代码而不是caffe的，但是没有实际测试过，官方说在之后的版本更新中将会把aten的逻辑迁移进c10中。

到这里，一个卷积层的forward操作就结束了，其他层的forward同理，接下来我们继续讲逻辑。

Conv2d的forward方法执行完成之后接着进行forward hook和backward_hook的步骤，与之前的pre_forward_hook相似。

到这里，Conv2d的__call__()方法执行完毕，接下来执行relu之类的逻辑，直到return

调用栈返回Net的forward的返回值，得到loss

到这里，前向传播完成

反向传播(loss.backward())

ok 接下来就到了最关键的地方了，loss.backward()

从断点调试中我们可以看出来，.backward()方法在这里只执行了一次，但是所有的梯度都被计算了，这个时候我们就要讲pt最核心的部分了：

首先，

pytorch/pytorchgithub.com

告诉了我们pt在执行forward的同时直接计算了导数，使用的是grad_fn中记录的函数，然而，代码逻辑貌似有一些出入： engine.cpp

auto Engine::thread_main(GraphTask *graph_task) -> void {
  auto queue = ready_queues[worker_device + 1];
  // Why the test on graph_task->outstanding_tasks?  See
  // Note [Reentrant backwards]
  while (!graph_task || graph_task->outstanding_tasks > 0) {
    FunctionTask task = queue->pop();
    if (task.fn && !task.base->has_error.load()) {
      GradMode::set_enabled(task.base->grad_mode);
      try {
        evaluate_function(task);
      } catch (std::exception& e) {
        thread_on_exception(task, e);
      }
    }
    // Notify downstream about the completion of tasks depending
    // on both where the task was executed, and who owned the overall
    // graph (in case of reentrant execution.)  See Note [Reentrant backwards].
    auto base_owner = task.base->owner;
    // Task from a non-worker thread. Easy case.
    if (base_owner == NO_DEVICE) {
      if (--task.base->outstanding_tasks == 0) {
        std::lock_guard<std::mutex> lock(task.base->mutex);
        task.base->not_done.notify_all();
      }
    } else {
      // If it's a task initiated from this thread, decrease the counter, but
      // don't do anything - loop condition will do all checks for us next.
      if (base_owner == worker_device) {
        --task.base->outstanding_tasks;
      // Otherwise send a dummy function task to the owning thread just to
      // ensure that it's not sleeping. If it has work, it might see that
      // graph_task->outstanding_tasks == 0 before it gets to the task, but
      // it's a no-op anyway.
      } else if (base_owner != worker_device) {
        if (--task.base->outstanding_tasks == 0) {
          // Synchronize outstanding_tasks with queue mutex
          std::atomic_thread_fence(std::memory_order_release);
          ready_queue_by_index(base_owner).push(FunctionTask(task.base, nullptr, InputBuffer(0)));
        }
      }
    }
  }
}

首先，所有的requires_grad为True的张量都会被记录并被添加进Engine::ready_queue_by_index中，这些tensor都会被以FunctionTask的结构体记录在ReadyQueue中

然后在执行进backward的时候，torch.tensor.backward()方法被调用，随后会调用torch.autograd.backward(), 最终，py_engine被启动，同时启动的还有engine.run_backward()(实际上,在编译后的二进制包中,路径为torch._C._EngineBase.run_backward)方法，这个方法在python_engine.cpp 中实现

PyObject *THPEngine_run_backward(THPEngine *self, PyObject *args, PyObject *kwargs)
{
  HANDLE_TH_ERRORS
  _maybe_reinitialize_engine_after_fork();
  PyObject *tensors = nullptr;
  PyObject *grad_tensors = nullptr;
  unsigned char keep_graph = 0;
  unsigned char create_graph = 0;
  PyObject *inputs = nullptr;
  unsigned char allow_unreachable = 0;
  const char *accepted_kwargs[] = {
      "tensors", "grad_tensors", "keep_graph", "create_graph", "inputs",
      "allow_unreachable", nullptr
  };
  if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OObb|Ob", (char**)accepted_kwargs,
        &tensors, &grad_tensors, &keep_graph, &create_graph, &inputs, &allow_unreachable))
    return nullptr;

  THPUtils_assert(PyTuple_Check(tensors), "tensors argument is expected to "
      "be a tuple, but got %s", THPUtils_typename(tensors));
  THPUtils_assert(PyTuple_Check(grad_tensors), "grad_tensors argument is "
      "expected to be a tuple, but got %s", THPUtils_typename(grad_tensors));

  Py_ssize_t num_tensors = PyTuple_GET_SIZE(tensors);
  Py_ssize_t num_gradients = PyTuple_GET_SIZE(grad_tensors);
  THPUtils_assert(num_tensors == num_gradients, "got %ld tensors and %ld "
      "gradients", num_tensors, num_gradients);

  edge_list roots;
  roots.reserve(num_tensors);
  variable_list grads;
  grads.reserve(num_tensors);
  for (int i = 0; i < num_tensors; i++) {
    PyObject *_tensor = PyTuple_GET_ITEM(tensors, i);
    THPUtils_assert(THPVariable_Check(_tensor), "element %d of tensors "
        "tuple is not a Tensor", i);
    auto& variable = ((THPVariable*)_tensor)->cdata;
    auto gradient_edge = variable.gradient_edge();
    THPUtils_assert(gradient_edge.function,
        "element %d of tensors does not require grad and does not have a grad_fn", i);
    roots.push_back(std::move(gradient_edge));

    PyObject *grad = PyTuple_GET_ITEM(grad_tensors, i);
    if (THPVariable_Check(grad)) {
      grads.push_back(((THPVariable*)grad)->cdata);
    } else {
      THPUtils_assert(grad == Py_None,
          "element %d of gradients tuple is not a Tensor or None", i);
      THPUtils_assert(!variable.requires_grad(),
          "element %d of gradients tuple is None, but the corresponding Tensor requires grad");
    }
  }

  std::vector<Edge> output_edges;
  if (inputs != nullptr) {
    int num_inputs = PyTuple_GET_SIZE(inputs);
    output_edges.reserve(num_inputs);
    for (int i = 0; i < num_inputs; ++i) {
      PyObject *input = PyTuple_GET_ITEM(inputs, i);
      THPUtils_assert(THPVariable_Check(input),
          "all inputs have to be Tensors, but got %s", THPUtils_typename(input));
      THPVariable *input_var = (THPVariable*)input;
      const auto output_nr = input_var->cdata.output_nr();
      auto grad_fn = input_var->cdata.grad_fn();
      if (!grad_fn) {
          grad_fn = input_var->cdata.try_get_grad_accumulator();
      }
      THPUtils_assert(input_var->cdata.requires_grad(),
          "One of the differentiated Tensors does not require grad");
      if (!grad_fn) {
        output_edges.emplace_back();
      } else {
        output_edges.emplace_back(grad_fn, output_nr);
      }
    }
  }

  variable_list outputs;
  {
    AutoNoGIL no_gil;
    outputs = engine.execute(roots, grads, keep_graph, create_graph, output_edges);
  }

  if (inputs != nullptr) {
    int num_inputs = PyTuple_GET_SIZE(inputs);
    THPObjectPtr py_outputs {PyTuple_New(num_inputs)};
    if (!py_outputs) return nullptr;
    for (int i = 0; i < num_inputs; i++) {
      THPUtils_assert(allow_unreachable || outputs[i].defined(), "One of the "
                      "differentiated Tensors appears to not have been used "
                      "in the graph. Set allow_unused=True if this is the "
                      "desired behavior.");
      PyTuple_SET_ITEM(py_outputs.get(), i, THPVariable_Wrap(outputs[i]));
    }
    return py_outputs.release();
  } else {
    Py_RETURN_NONE;
  }
  END_HANDLE_TH_ERRORS
}

我们可以看到，在这里面，pt遍历了所有的节点，并返回最终结果，也就是说，在这里实现了不保留中间过程的梯度计算。

最后总结一下，首先在前向传播的时候，所有requiresgrad==True的对象都会被添加进一个容器中，然后在backward执行之前，首先启动一个处理引擎，在做了初始化和读取相关的记录(包括之前的哪个容器)后调用了run_backward方法，然后统一计算出梯度，并返回loss的梯度。

好了，到这一步，pt的前传和梯度计算已经全部完成了，接下来是求Jacobian矩阵，Hessian矩阵以及参数更新的过程了，我们下一篇文章再讲(这次真的不会拖更了23333，估计一周之内吧23333)

weixin_39890332

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
focal loss dice loss源码_PyTorch 1.1.0 源码解析--运行机制-1

啊。。。。时隔一年这个专栏又更新了，这一年发生了很多事情，而最近也是因为在找实习所以稍微空闲了一点了，距离PyTorch 1.1.0发布也已经过了一阵子了，自从pt1.0发布，pt的底层基本等于重构了，现在加入了caffe2作为后端，代码结构也更加科学了，所以，就有了这篇文章，再又一次翻阅了源码之后，先聊一些浅显的吧，从运行机制和过程开始聊起从minst谈起老规矩，我们继续从mnist开始clas...
复制链接

扫一扫