高效管理深度学习实验

最新推荐文章于 2024-03-07 21:10:56 发布

*pprp*

最新推荐文章于 2024-03-07 21:10:56 发布

阅读量734

点赞数 5

分类专栏：科研技巧工具知识总结文章标签：人工智能 python 机器学习

原创文章不要私自转载，自私转载必究责任，如需转载请联系wx:topeijie商谈

本文链接：https://blog.csdn.net/DD_PP_JJ/article/details/117768619

版权

【GiantPandaCV导语】这学期参加了一个比赛，有比较大的代码量，在这个过程中暴露出来很多问题。由于实验记录很糟糕，导致结果非常混乱、无法进行有效分析，也没能进行有效的回溯。趁比赛完结，打算重构一下代码，顺便参考一些大型项目的管理方法。本文将总结如何高效、标准化管理深度学习实验。以下总结偏个人，可能不适宜所有项目，仅供参考。

1. 目前的管理方法

因为有很多需要尝试的想法，但是又按照下图这种时间格式来命名文件夹，保存权重。每次运行尝试的方法只是记录在本子上和有道云笔记上。

权重保存文件

笔记截图：

笔记部分截图

总体来说，这种管理方法不是很理想。一个实验运行的时间比较久，跨度很久，而之前调的参数、修改的核心代码、想要验证的想法都已经很模糊了，甚至有些时候可能看到一组实验跑完了，忘记了这个实验想要验证什么。

这样的实验管理是低效的，笔者之前就了解到很多实验管理的方法、库的模块化设计，但这些方法都沉寂在收藏夹中，无用武之地。趁着这次比赛结束，好好对代码进行重构、完善实验管理方法、总结经验教训。同时也参考了交流群里蒋神、雪神等大佬的建议，总结了以下方法。

2. 大型项目实例

先推荐一个模板，是L1aoXingyu@Github分享的模板项目，链接如下：

https://github.com/L1aoXingyu/Deep-Learning-Project-Template

如果长期维护一个深度学习项目，代码的组织就比较重要了。如何设计一个简单而可扩展的结构是非常重要的。这就需要用到软件工程中的OOP设计

L1aoXingyu的模板

简单介绍一下：

实验配置的管理（实验配置就是深度学习实验中的各种参数）
- 使用yacs管理配置。
- 配置文件一般分默认配置（default）和新增配置（argparse）
模型的管理
- 使用工厂模式，根据传入参数得到对应模型。

├──  config
│    └── defaults.py  - here's the default config file.
│
│
├──  configs  
│    └── train_mnist_softmax.yml  - here's the specific config file for specific model or dataset.
│ 
│
├──  data  
│    └── datasets  - here's the datasets folder that is responsible for all data handling.
│    └── transforms  - here's the data preprocess folder that is responsible for all data augmentation.
│    └── build.py  		   - here's the file to make dataloader.
│    └── collate_batch.py   - here's the file that is responsible for merges a list of samples to form a mini-batch.
│
│
├──  engine
│   ├── trainer.py     - this file contains the train loops.
│   └── inference.py   - this file contains the inference process.
│
│
├── layers              - this folder contains any customed layers of your project.
│   └── conv_layer.py
│
│
├── modeling            - this folder contains any model of your project.
│   └── example_model.py
│
│
├── solver             - this folder contains optimizer of your project.
│   └── build.py
│   └── lr_scheduler.py
│   
│ 
├──  tools                - here's the train/test model of your project.
│    └── train_net.py  - here's an example of train model that is responsible for the whole pipeline.
│ 
│ 
└── utils
│    ├── logger.py
│    └── any_other_utils_you_need
│ 
│ 
└── tests					- this foler contains unit test of your project.
     ├── test_data_sampler.py

另外推荐一个封装的非常完善的库，deep-person-reid, 链接：https://github.com/KaiyangZhou/deep-person-reid，这次总结中有一部分代码参考自以上模型库。

3. 熟悉工具

与上边推荐的模板库不同，个人觉得可以进行简化处理，主要用到的python工具有：

argparse
yaml
logging

前两个用于管理配置，最后一个用于管理日志。

3.1 argparse

argparse是命令行解析工具，分为四个步骤：

import argparse
parser = argparse.ArgumentParser()
parser.add_argument()
parser.parse_args()

第2步创建了一个对象，第3步为这个对象添加参数。

parser.add_argument('--batch_size', type=int, default=2048,
                    help='batch size')  # 8192
parser.add_argument('--save_dir', type=str,
                    help="save exp floder name", default="exp1_sandwich")

--batch_size将作为参数的key，它对应的value是通过解析命令行（或者默认）得到的。type可以选择int,str。

parser.add_argument('--finetune', action='store_true',
                    help='finetune model with distill')

action可以指定参数处理方式，默认是“store”代表存储的意思。如果使用"store_true", 表示他出现，那么对应参数为true，否则为false。

第4步，解析parser对象，得到的是可以通过参数访问的对象。比如可以通过args.finetune 得到finetune的参数值。

3.2 yaml

yaml是可读的数据序列化语言，常用于配置文件。

支持类型有：

标量（字符串、证书、浮点）
列表
关联数组字典

语法特点：

大小写敏感
缩进表示层级关系
列表通过 “-” 表示，字典通过 ":"表示
注释使用 “#”

安装用命令：

pip install pyyaml

举个例子：

name: tosan
age: 22
skill:
  name1: coding
  time: 2years
job

最低0.47元/天解锁文章

*pprp*

关注

5
点赞
踩
12

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录