pytorch从训练到部署

最新推荐文章于 2024-07-20 18:06:55 发布

学渣在路上

最新推荐文章于 2024-07-20 18:06:55 发布

阅读量596

点赞数

分类专栏： libtorch pytorch C++

本文链接：https://blog.csdn.net/XDH19910113/article/details/116056724

版权

pytorch 同时被 3 个专栏收录

9 篇文章 1 订阅

订阅专栏

libtorch

4 篇文章 0 订阅

订阅专栏

C++

3 篇文章 0 订阅

订阅专栏

说明：

最近梳理了从pytorch训练模型，到模型转换，到使用C程序运行模型的一整套过程，结构简单易用理解。由于代码行数比较多就不一一写到文章中了，只讲每个功能主要的代码贴出来了，相信有一些基础的人就已经能够运行了。

感谢参考内容的作者

关于在Ubuntu系统上配置深度学习开发环境

https://blog.csdn.net/XDH19910113/article/details/111470521

训练模型

代码名：

catdog（原名pytorch-train-test-onnx）

核心代码：

def train(args):
    # read data
    dataloders, dataset_sizes, class_names = ImageDataset(args)

    with open(args.class_file, 'w') as f:
        for name in class_names:
            f.writelines(name + '\n')
    # use gpu or not
    use_gpu = torch.cuda.is_available()
    print("use_gpu:{}".format(use_gpu))

    # get model
    model = SimpleNet()

    (images, labels) = next(iter(dataloders['train']))
    trainsw.add_graph(model=model, input_to_model=images)

    if args.resume:
        if os.path.isfile(args.resume):
            print(("=> loading checkpoint '{}'".format(args.resume)))
            model.load_state_dict(torch.load(args.resume))
        else:
            print(("=> no checkpoint found at '{}'".format(args.resume)))

    if use_gpu:
        model = torch.nn.DataParallel(model)
        model.to(torch.device('cuda'))
    else:
        model.to(torch.device('cpu'))

    # 用交叉熵损失函数(define loss function)
    criterion = nn.CrossEntropyLoss()

    # 梯度下降(Observe that all parameters are being optimized)
    optimizer_ft = optim.SGD(model.parameters(), lr=args.lr, momentum=0.9, weight_decay=1e-4)

    # Decay LR by a factor of 0.98 every 1 epoch
    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=1, gamma=0.98)

    model = train_model(args=args,
                        model=model,
                        criterion=criterion,
                        optimizer=optimizer_ft,
                        scheduler=exp_lr_scheduler,
                        num_epochs=args.num_epochs,
                        dataset_sizes=dataset_sizes,
                        use_gpu=use_gpu,
                        dataloders=dataloders)

    torch.save(model.state_dict(), os.path.join(args.save_path, 'scenejudgment.pth'))

代码说明：

这个代码是从网上下的，觉得代码写的很规范，功能很全，就在这个代码的基础上进行了一系列的操作，基础的东西使用别人造好的轮子

下载地址：

https://download.csdn.net/download/dcrmg/11939795

下载之后文件名：

pytorch-train-test-onnx.rar

class SimpleNet(nn.Module):	网络结构
def train(args):	训练
def test(test_model_path, test_img_path, class_file):	测试
def convert_model_to_ONNX(input_img_size, input_pth_model, output_ONNX):
    将pth转为onnx
def readImg(path):	数据增强，这个函数读取完图像后直接做了数据增强
def set_parser():	设置参数，python很常用的一个包

问题说明：

问题1：

这个工程数据有一些问题，太乱了，分类猫狗，长啥样的都有，还有很多错的，因此准确率一直上不去，90%是个坎，后期可以尝试更换一批数据集重新进行尝试

问题2：

报错：

Unexpected key(s) in state_dict: "module.conv1.0.weight", "module.conv1.0.bias"

原因：

训练的时候使用GPU进行训练，会在保存模型时增加“module.”字符串，等到test的时候一张图像一张图像进行测试，不启用GPU，因此读取的模型名前没有“module.”字符串，造成此处报错。将模型的参数全部打印出来就会发现，key值不同，多了module

解决办法：

https://blog.csdn.net/qq_32998593/article/details/89343507

个人使用了其中的办法一

说明：

造成这个错误的代码是test部分的：

    model.load_state_dict(state_dict)

虽然源代码在这个部分有明确的提示：

    if torch.cuda.device_count() > 1:
    # 如果有多个GPU，将模型并行化，用DataParallel来操作。这个过程会将key值加一个"module. ***"。
        model = nn.DataParallel(model)
    model.load_state_dict(checkpoint)

源代码中的这行代码也出现在了解决办法中，但是个人尝试没有成功。个人怀疑原因可能是我只有一块显卡，nn.DataParallel只有在多块显卡的时候才能起作用。

模型转换

代码名：

catdog、pth2pt

核心代码：

def convert_model_to_ONNX(input_img_size, input_pth_model, output_ONNX):
    dummy_input = torch.randn(2, 3, input_img_size, input_img_size)
    model = SimpleNet()

    state_dict = torch.load(input_pth_model)
    new_state_dict = OrderedDict()
    for k, v in state_dict.items():
        name = k[7:]  # remove `module.`
        new_state_dict[name] = v
    model.load_state_dict(new_state_dict)

    input_names = ["input_image"]
    output_names = ["output_classification"]

    torch.onnx.export(model, dummy_input, output_ONNX, verbose=True, input_names=input_names,
                      output_names=output_names)

if __name__ == '__main__':
    model = Net()
    state_dict = torch.load(r"models\scenejudgment.pth")
    new_state_dict = OrderedDict()
    for k, v in state_dict.items():
        name = k[7:]  # remove `module.`
        new_state_dict[name] = v
    model.load_state_dict(new_state_dict)
    model.eval()
    example = torch.rand(1, 3, 90, 90)
    traced_script_module = torch.jit.trace(model, example)
    traced_script_module.save(r"models\new.pt")

代码说明：

使用pytorch训练保存的文件是pth文件，想要使用C语言运行模型就需要将模型类型进行转换，pt文件类型是pytorch自带的libtorch使用的文件类型；onnx文件类型可以被opencv使用

pytorch-train-test-onnx这个工程自带将pth文件转为onnx文件的代码，onnx能够被很多推理引擎使用

pth2pt这个工程自己写的，从网上搜了一些相关的博客，都没能跑通，自己参考了一些东西，跑成功了，

地址：

https://blog.csdn.net/XDH19910113/article/details/115953274

问题说明：

没遇到啥问题，这个位置挺简单的，将模型训练好之后放到C上之前都需要进行模型转换。

这个位置吐槽一下，你说深度学习光框架就有pytorch、Tensorflow、Caffe等好几种，他们生成的文件类型都不一样。然后推理引擎也有好多种，有什么libtorch、TensorRT、OpenCV、MNN、NCNN等，然后这些推理引擎需要的文件输入类型也不一样。所以模型转换这一块深度学习框架和推理引擎模型转换是做乘法呀，

模型部署

代码名：

catdog、libtorchtry1

核心代码：

int Classfier(cv::Mat &image){
    torch::Tensor img_tensor = torch::from_blob(image.data, {1, image.rows, image.cols, 3}, torch::kByte);
    img_tensor = img_tensor.permute({0, 3, 1, 2});
    img_tensor = img_tensor.toType(torch::kFloat);
    img_tensor = img_tensor.div(255);
    //std::shared_ptr<torch::jit::script::Module> module = torch::jit::load("../models/new.pt");
    torch::jit::script::Module module = torch::jit::load("../models/new.pt");
    torch::Tensor output = module.forward({img_tensor}).toTensor();
    //std::cout << "output:" << output << std::endl;
    auto max_result = output.max(1, true);
    //std::cout << "max_result:" << max_result << std::endl;
    auto max_index = std::get<1>(max_result).item<float>();
    //std::cout << max_index << std::endl;
    return int(max_index);
}

// 构造函数
PthONNX::PthONNX(const std::string &model_path, const std::string &classes_file_path,
                 cv::Size input_size) : input_size_(input_size) {
    std::ifstream ifs(classes_file_path.c_str());
    assert(ifs.is_open());
    std::string line;
    while (getline(ifs, line)) {
        line = line;
        classes_.push_back(line);
    }

    net_classify_ = cv::dnn::readNetFromONNX(model_path);
    net_classify_.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
    net_classify_.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
}

// ONNX推理入口函数
void PthONNX::Classify(const cv::Mat &input_image, std::string &classification_results) {
    assert(input_image.data);
    cv::Mat image = input_image.clone();
    cv::resize(image, image, cv::Size(90, 90));
    cv::cvtColor(image, image, cv::COLOR_BGR2GRAY);
    ClassifyImplement(image,classification_results);
}

//ONNX推理主函数
void PthONNX::ClassifyImplement(const cv::Mat &image,std::string &classification_results) {
    classification_results.clear();
    //***********前处理***********
    cv::Scalar mean_value(0, 0, 0);
    cv::Mat input_blob = cv::dnn::blobFromImage(image, 1, input_size_, mean_value, false, false, CV_32F);
    //***********前处理***********
    
    net_classify_.setInput(input_blob);
    const std::vector<cv::String> &out_names = net_classify_.getUnconnectedOutLayersNames();
    cv::Mat out_tensor = net_classify_.forward(out_names[0]);

    //***********后处理***********
    double minVal;
    double maxVal;
    cv::Point minIdx;
    cv::Point maxIdx;	// minnimum Index, maximum Index
    cv::minMaxLoc(out_tensor, &minVal, &maxVal, &minIdx, &maxIdx);
    int index_class = maxIdx.x;
    classification_results = (index_class <= 1) ? classes_[index_class] : "None";
    //***********后处理***********
}

代码说明：

pytorch-train-test-onnx这个工程自带在C上使用OpenCV作为推理引擎的程序TestOnnx.cpp，TestOnnx.cpp使用时需要注意的一点是如果训练模型时使用的是彩色图像，则调用TestOnnx.cpp程序时使用的必须也是彩色图像，如果训练时是灰度图像则调用TestOnnx.cpp程序时必须也是灰度图像，需要保持两者的输入相同

libtorchtry1这个工程在参考链接的基础上，对应我的模型进行了一些修改

参考链接：

https://zhuanlan.zhihu.com/p/72750321

下一步工作：

更换数据集，重新训练模型使分类准确率达到较高的水平
增加使用TensorRT进行推理的示例
比较耗时
敬请期待

学渣在路上

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
pytorch从训练到部署

说明：最近梳理了从pytorch训练模型，到模型转换，到使用C程序运行模型的一整套过程，结构简单易用理解。由于代码行数比较多就不一一写到文章中了，只讲每个功能主要的代码贴出来了，相信有一些基础的人就已经能够运行了。感谢参考内容的作者关于在Ubuntu系统上配置深度学习开发环境https://blog.csdn.net/XDH19910113/article/details/111470521训练模型代码名：catdog（原名pytorch-train-test-onnx）核心代码：def
复制链接

扫一扫

专栏目录