DALI Cookbook by Eric

songyuc

已于 2023-04-20 12:03:43 修改

阅读量593

点赞数

文章标签： DALI

于 2022-09-03 16:56:57 首次发布

本文链接：https://blog.csdn.net/songyuc/article/details/126679606

版权

本文介绍了如何利用NVIDIA DALI库进行数据预处理，包括安装、定义Pipeline、使用fn.readers.coco读取COCO数据、fn.decoders.image解码图像，以及自定义CUDA算子的步骤。此外，还提到了在CMake中编译自定义操作的示例，并指出在调试DALI时遇到的错误和解决方法。最后，讨论了在没有官方API文档的情况下如何了解C++库函数的使用。

摘要由CSDN通过智能技术生成

1. Installation

Installation — NVIDIA DALI documentation

1.1 Requirements

Basic knowledge

链接形式：.so文件

例如：customdummy/build/libcustomdummy.so。

API notes

`CUDA_CALL`：检查CUDA函数调用是否报错

Source: inline void CUDA_CALL(T status)

1. Defining the pipeline: `@pipeline_def`

对于定义Pipeline，我们根据DALI文档的示例总结了下面的规则：

在pipeline定义中，仅推荐使用dali.fn的算子、或由dali.fn构成的函数；
除规则[1.]之外，简单的四则运算符也是可以使用的，包括+, -, *, /；
对于控制流，无法直接使用if语句和for语句，需要使用其它方式进行等效实现[DALI-doc/Conditional-Like_Execution_and_Masking]；

1.1 `fn.readers.coco`：读取COCO数据

参数说明：

file_root：COCO图像根目录，包含.jpg文件的目录；
annotations_file：JSON标注文件路径；

返回值说明：

Dali文档：nvidia.dali.fn.readers.coco — NVIDIA DALI 1.18.0 documentation
fn.readers.coco的返回值如下：

images, bounding_boxes, labels, ((polygons, vertices) | (pixelwise_masks)), (image_ids)

示例：

images, bboxes, labels = fn.readers.coco(
	file_root="coco_root/train2017",
	annotations_file="coco_root/annotations/instances_train2017.json",
	skip_empty=True, # 跳过不包含目标实例的样本
	ratio=True,
	ltrb=True,
	random_shuffle=False,
	shuffle_after_epoch=True,  # 两个参数联合使用实现 data shuffling
	name="Reader")

Note
在使用random_shuffle=False,shuffle_after_epoch=True来随机化数据时，readers.coco会在每次epoch结束之后进行shuffle，也就是 train_loader遍历一次之后才会进行随机化，且每次运行时的随机种子是固定的，不同运行时每次的图像序列是相同的。

1.2 `fn.decoders.image`：解码图像数据

images = fn.decoders.image(images, device="mixed")

Note
在TensorFlow_YOLOv4代码使用的是images = dali.fn.decoders.image(inputs, device=device, output_type=dali.types.RGB)，指定了output_type参数，经过查看文档后发现：output_type的默认值是DALIImageType.RGB；
经过测试：assert types.RGB == DALIImageType.RGB and types.RGB is DALIImageType.RGB，发现这两个实际上是同一个数据类型，所以我们在这里就省略了output_type参数。

2. Customizing operator

在自定义DALI算子时，我们需要时用到CUDA（Compute Unified Device Architecture）和C++；
编译工具：CMake
需要实现的函数列表：

SetupImpl：提供算子输出的shape和type；

自定义算子步骤：

在头文件中声明算子定义；
实现接口函数；

2.1 Operator Definition

Header: `dummy.h`

#ifndef EXAMPLE_DUMMY_H_
#define EXAMPLE_DUMMY_H_

#include <vector>

#include "dali/pipeline/operator/operator.h"	// 声明dali的头文件

namespace other_ns {

template <typename Backend>
class Dummy : public ::dali::Operator<Backend> {
// 这里派生类Dummy以public形式继承dali::Operator<Backend>
 public:
  inline explicit Dummy(const ::dali::OpSpec &spec) :
    ::dali::Operator<Backend>(spec) {}
  // explicit: 指定构造函数Dummy()为显式调用

  virtual inline ~Dummy() = default;

  Dummy(const Dummy&) = delete;
  Dummy& operator=(const Dummy&) = delete;
  Dummy(Dummy&&) = delete;
  Dummy& operator=(Dummy&&) = delete;

 protected:
  bool CanInferOutputs() const override {
    return true;
  }

  bool SetupImpl(std::vector<::dali::OutputDesc> &output_desc,
                 const ::dali::workspace_t<Backend> &ws) override {
    const auto &input = ws.Input<Backend>(0);
    output_desc.resize(1);
    output_desc[0] = {input.shape(), input.type()};
    return true;
  }

  void RunImpl(::dali::workspace_t<Backend> &ws) override;
};

}  // namespace other_ns

#endif  // EXAMPLE_DUMMY_H_

CPU Operator: `dummy.cc`

#include "dummy.h"

namespace other_ns {

template <>
void Dummy<::dali::CPUBackend>::RunImpl(::dali::HostWorkspace &ws) {
  const auto &input = ws.Input<::dali::CPUBackend>(0);
  auto &output = ws.Output<::dali::CPUBackend>(0);

  ::dali::TypeInfo type = input.type_info();
  auto &tp = ws.GetThreadPool();
  const auto &in_shape = input.shape();
  for (int sample_id = 0; sample_id < in_shape.num_samples(); sample_id++) {
    tp.AddWork(
        [&, sample_id](int thread_id) {
          type.Copy<::dali::CPUBackend, ::dali::CPUBackend>(output.raw_mutable_tensor(sample_id),
                                                            input.raw_tensor(sample_id),
                                                            in_shape.tensor_size(sample_id), 0);
        },
        // 这里是添加了一个匿名的lambda函数: [&]表示捕获外部作用域中所有变量, {}表示函数体代码 
        in_shape.tensor_size(sample_id));
  }
  tp.RunAll();
}

}  // namespace other_ns

DALI_REGISTER_OPERATOR(CustomDummy, ::other_ns::Dummy<::dali::CPUBackend>, ::dali::CPU);

DALI_SCHEMA(CustomDummy)
    .DocStr("Make a copy of the input tensor")
    .NumInput(1)
    .NumOutput(1);

2.2 CMake compiling

官方模板：

cmake_minimum_required(VERSION 3.10)
set(CMAKE_CUDA_ARCHITECTURES "35;50;52;60;61;70;75;80;86")

project(custom_dummy_plugin LANGUAGES CUDA CXX C)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_C_STANDARD 11)

set(CMAKE_CUDA_STANDARD 14)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)

include_directories(SYSTEM "${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}")

execute_process(
        COMMAND python -c "import nvidia.dali as dali; print(dali.sysconfig.get_lib_dir())"
        OUTPUT_VARIABLE DALI_LIB_DIR)
# 通过python命令获得DALI库目录
string(STRIP ${DALI_LIB_DIR} DALI_LIB_DIR)

execute_process(
        COMMAND python -c "import nvidia.dali as dali; print(\" \".join(dali.sysconfig.get_compile_flags()))"
        OUTPUT_VARIABLE DALI_COMPILE_FLAGS)
# 通过python命令获得 DALI_COMPILE_FLAGS 变量
string(STRIP ${DALI_COMPILE_FLAGS} DALI_COMPILE_FLAGS)

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${DALI_COMPILE_FLAGS} ")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} ${DALI_COMPILE_FLAGS} ")
link_directories("${DALI_LIB_DIR}")

add_library(customdummy SHARED dummy.cc dummy.cu)
# SHARED: 使用动态链接的方式
target_link_libraries(customdummy dali)

2.3 DALI的GPU算子实现不能使用TorchScript作为后端

TorchScript的推理运算是在CPU的主线程上同步执行的；这意味着在TorchScript的推理运算期间，主线程将不会执行其他任务。然而，TorchScript可以通过多线程和多进程并行执行多个推理任务来提高性能。在这种情况下，主线程仍然是同步执行的，但是它会在不同的线程或进程中执行多个推理任务。
这一点也经过深蓝学院老师的证实：
在这里插入图片描述

3. Debugging DALI

遍历`TensorList`

`TensorList`是非dense结构：`tensor_list.at()`

当TensorList是非dense的结构时，使用tensor_list.at(idx)来遍历每一个张量数据；

4. Troubleshooting

4.1 出现错误：[/opt/dali/dali/util/mmaped_file.cc:105] File mapping failed: /train2017/000000000285.jpg

在学习dali时，遇到过这样一个error:

Traceback (most recent call last):
  File "/xxx/test/dali/validate_random_shuffle2.py", line 63, in <module>
    main()
  File "/xxx/test/dali/validate_random_shuffle2.py", line 34, in main
    train_loader = DALIGenericIterator(
  File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py", line 196, in __init__
    self._first_batch = DALIGenericIterator.__next__(self)
  File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py", line 213, in __next__
    outputs = self._get_outputs()
  File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/plugin/base_iterator.py", line 297, in _get_outputs
    outputs.append(p.share_outputs())
  File "/xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/pipeline.py", line 1002, in share_outputs
    return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline:
Error when executing CPU operator readers__COCO encountered:
[/opt/dali/dali/util/mmaped_file.cc:105] File mapping failed: /train2017/000000000285.jpg
Stacktrace (10 entries):
[frame 0]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali.so(+0x847ff) [0x7f09562857ff]
[frame 1]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali.so(+0x1b0c27) [0x7f09563b1c27]
[frame 2]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali.so(dali::FileStream::Open(std::string const&, bool, bool)+0x110) [0x7f09563a2800]
[frame 3]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali_operators.so(dali::FileLabelLoader::ReadSample(dali::ImageLabelWrapper&)+0x26a) [0x7f0931c718ea]
[frame 4]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali_operators.so(+0x31ccc41) [0x7f0931ccec41]
...

Current pipeline object is no longer valid.

主要可以关注：

[/opt/dali/dali/util/mmaped_file.cc:105] File mapping failed: /train2017/00000000xxxx.jpg
[frame 3]: /xxx/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/nvidia/dali/libdali_operators.so(dali::FileLabelLoader::ReadSample(dali::ImageLabelWrapper&)+0x26a) [0x7f0931c718ea]

可以看到，很可能是数据集读取出现了问题，这里是因为我们把fn.readers.coco.file_root的路径写错了；

5. How to know a function usage in a C++ library without an official API documentation

The best way to know the usage of a certain function in a C++ library without an official API documentation is to:

Look for any documentation or examples provided by the library’s creator or users on the library’s website or on online forums.
Look for the function’s declaration in the library’s header files and examine its parameters and return type to understand its usage.
Experiment with the function by writing test code and observing its behavior.
If necessary, reverse engineer the function by decompiling the library’s binary files and examining its implementation.
Ask for help from experienced developers or the library’s creator if
all else fails.