深度学习计算框架PyTorch训练不同网络方法&示例

技术瘾君子1573

于 2024-08-15 08:44:49 发布

阅读量436

点赞数 9

分类专栏：人工智能&深度学习&机器学习文章标签：深度学习 pytorch 网络训练

本文链接：https://blog.csdn.net/qq_27815483/article/details/141190129

版权

人工智能&深度学习&机器学习专栏收录该内容

121 篇文章 1 订阅

订阅专栏

Objection-Faster-rcnn

Publiction/Attribution.

Training and test data separation

Publication/Attribution

Structure

Loss function

Optimizer

Learning rate schedule

Evaluation thoroughness

Classification-bench

该测试用例用于PyTorch分类模型性能测试

该脚本可支持PyTorch的nccl和gloo分布式通信库方式

运行

单卡

python3 `pwd`/main_bench.py --batch-size=64 --a=resnet50 -j 24 --epochs=1 --synthetic /path/to/any/existing/folder

单机多卡

mpirun -np 4  --bind-to none `pwd`/single_process.sh localhost inception_v3 64

分布式多卡

mpirun -np $np --hostfile hostfile --bind-to none `pwd`/single_process.sh $dist_url resnet50 64

hostfile格式参考：

node1 slots=4  
node2 slots=4

参考

examples/imagenet at main · pytorch/examples · GitHub

Classification-Acc

该测试用例用于ResNet50精度验证，单卡运行指令如下

运行示例

fp32

python3 main_acc.py --batch-size=64 --arch=resnet50 -j 6 --epochs=90 --save-path=/path/to/{save_model_dir} /path/to/{ImageNet_pytorch_data_dir}/

fp16

python3 main_acc.py --batch-size=64 --arch=resnet50 -j 6 --epochs=90 --amp --opt-level O1 --loss-scale=dynamic --save-path=/path/to/{save_model_dir} /path/to/{ImageNet_pytorch_data_dir}/

参考

examples/imagenet at main · pytorch/examples · GitHub

Objection-Faster-rcnn

该测试用例用于PyTorch目标检测模型Fasterrcnn测试。

运行

train.py中get_dataset函数需要根据实际数据集情况设置json文件的位置、类别数目等。

单卡

python3 train.py  --batch-size=2 -j 8 --epochs=26 --data-path=/path/to/datasets/folder --output-dir=/path/to/result/save/folder

单机多卡

mpirun -np 4 --hostfile hostfile --bind-to none `pwd`/single_process.sh localhost

多机多卡

mpirun -np $np --hostfile hostfile --bind-to none `pwd`/single_process.sh ${master_ip}

参考

vision/references/detection at main · pytorch/vision · GitHub

Objection-MaskRCNN

该测试用例用于PyTorch目标检测模型Fasterrcnn测试。

运行指令

单卡

python3 train.py --dataset coco --model maskrcnn_resnet50_fpn --epochs 26 \
     --lr-steps 16 22 --aspect-ratio-group-factor 3 \
     --data-path /path/to/{COCO2017_data_dir}

若报错Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to .cache/torch/checkpoints/resnet50-19c8e357.pth失败，则需提前下载resnet50-19c8e357.pth，拷贝至.cache/torch/checkpoints/。

多卡

python3 -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --dataset coco --model maskrcnn_resnet50_fpn --epochs 26 --lr-steps 16 22 --aspect-ratio-group-factor 3 --lr 0.005 --data-path /path/to/{COCO2017_data_dir} > train_2gpu_lr0.005.log 2>&1 &

注意：多卡运行时，学习率与卡数的对应关系为0.02/8*$NGPU，例如，lr_4gpu=0.01，lr_2gpu=0.005，lr_1gpu=0.0025。

参考

vision/references/detection at main · pytorch/vision · GitHub

Objection-SSD

该脚本是基于目标检测模型SSD_ResNet34的功能测试用例，参考mlperf工程，当mAP值达到0.23时，视为模型收敛并成功结束作业运行。

运行

安装依赖库

Cython==0.28.4
mlperf-compliance==0.0.10
cycler==0.10.0
kiwisolver==1.0.1
matplotlib==2.2.2
numpy==1.14.5
Pillow==5.2.0
pyparsing==2.2.0
python-dateutil==2.7.3
pytz==2018.5
six==1.11.0
torchvision(if installed, ignore it)
apex(if installed, ignore it)

下载数据集

bash download_dataset.sh

运行训练脚本

单节点环境配置、系统超参设置脚本为config_singlenode.sh，可根据实际情况对应修改
多节点环境配置、系统超参设置脚本为config_multinode.sh，可根据实际情况对应修改

单机单卡（FP32）

python3 train_fp32.py \
                  --epochs "${NUMEPOCHS}" \
                  --warmup-factor 0 \
                  --lr "${LR}" \
                  --no-save \
                  --threshold=0.23 \
                  --data ${DATASET_DIR} \
                  --batch-size ${BATCH_SIZE}
                  --warmup-factor 0
                  --warmup ${WARMUP}

单机多卡（FP32）

python3 -m bind_launch --nsockets_per_node ${NSOCKET} \
                  --ncores_per_socket ${SOCKETCORES} \
                  --nproc_per_node ${NTASKS_PER_NODE} \
                  --no_hyperthreads \
                  --no_membind \
                  train_fp32.py \
                  --epochs "${NUMEPOCHS}" \
                  --warmup-factor 0 \
                  --lr "${LR}" \
                  --no-save \
                  --threshold=0.23 \
                  --data ${DATASET_DIR} \
                  --batch-size ${BATCH_SIZE}
                  --warmup-factor 0
                  --warmup ${WARMUP}

可参考作业提交脚本 run_fp32_single.sh

多机多卡（FP32）

sh run_fp32_multi.sh

参考run_fp32_multi.sh脚本,其中hostfile文件内容格式参考如下：

  node1 slots=4  
  node2 slots=4

单机单卡（FP16）

python3  train_fp16.py \
                  --epochs "${NUMEPOCHS}" \
                  --warmup-factor 0 \
                  --lr "${LR}" \
                  --no-save \
                  --threshold=0.23 \
                  --data ${DATASET_DIR} \
                  --opt-level O3 --loss-scale="dynamic" --keep-batchnorm-fp32 True \
                  --batch-size 180 \
                  --warmup ${WARMUP}

单机多卡（FP16）

python3 -m bind_launch --nsockets_per_node ${NSOCKET} \
                  --ncores_per_socket ${SOCKETCORES} \
                  --nproc_per_node ${NTASKS_PER_NODE} \
                  --no_hyperthreads \
                  --no_membind \
                  train_fp16.py \
                  --epochs "${NUMEPOCHS}" \
                  --warmup-factor 0 \
                  --lr "${LR}" \
                  --no-save \
                  --threshold=0.23 \
                  --data ${DATASET_DIR} \
                  --opt-level O3 --loss-scale="dynamic" --keep-batchnorm-fp32 True \
                  --batch-size 180 \
                  --warmup ${WARMUP}

可参考作业提交脚本 run_fp16_single.sh

多机多卡（FP16）

sh run_fp16_multi.sh

类似地, hostfile文件的设置可参考上文部分

数据集

Publiction/Attribution.

Microsoft COCO: COmmon Objects in Context. 2017.

Training and test data separation

Train on 2017 COCO train data set, compute mAP on 2017 COCO val data set.

模型

Publication/Attribution

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. In the Proceedings of the European Conference on Computer Vision (ECCV), 2016.

Backbone is ResNet34 pretrained on ILSVRC 2012 (from torchvision). Modifications to the backbone networks: remove conv_5x residual blocks, change the first 3x3 convolution of the conv_4x block from stride 2 to stride1 (this increases the resolution of the feature map to which detector heads are attached), attach all 6 detector heads to the output of the last conv_4x residual block. Thus detections are attached to 38x38, 19x19, 10x10, 5x5, 3x3, and 1x1 feature maps.

评价指标

Quality metric

Metric is COCO box mAP (averaged over IoU of 0.5:0.95), computed over 2017 COCO val data.

Quality target

mAP of 0.23

Evaluation frequency

Evaluation thoroughness

All the images in COCO 2017 val data set.

参考

training/single_stage_detector/ssd at master · mlcommons/training · GitHub

Objection-YOLOv3

本测试用例用于测试目标检测YOLOv3模型在ROCm平台PyTorch框架下的训练性能、推理性能和检测准确性，测试流程如下

测试流程

数据预处理

使用本算例进行测试前，需要将coco数据转化为符合yolov3模型输入要求的格式，即将数据集中的annotation json文件转化为label。操作流程如下
1、下载coco-to-yolo工具
git clone Bitbucket
cd coco-to-yolo
2、下载cocotoyolo.jar
wget http://commecica.com/wp-content/uploads/2018/07/cocotoyolo.jar
3、转换格式
（1） train json to label
java -jar cocotoyolo.jar "/path/to/{COCO2017_data_dir}/annotations/instances_train2017.json" "/path/to/{COCO2017_data_dir}/images/train2017" "all" "coco/yolo/"
（2） val json to label
java -jar cocotoyolo.jar "/path/to/{COCO2017_data_dir}/annotations/instances_val2017.json" "/path/to/{COCO2017_data_dir}/images/val2017" "all" "coco/yolo/"
4、步骤3会生成coco/yolo/.txt文件，其中list.txt文件需重命名为train2017.txt和val2017.txt，其余txt文件为对应images/train2017&val2017图片的label。

下载预训练模型

下载链接
https://drive.google.com/drive/folders/1LezFG5g3BCW6iYaV89B2i64cqEUZD7e0
下载完成后放入weight目录

运行示例

训练

单卡

python3 train.py --cfg cfg/yolov3.cfg --weights weights/yolov3.pt --data data/coco2017.data --batch 32 --accum 2 --device 0

运行前需确认coco2017.data中train2017.txt和val2017.txt中的数据路径

多卡

python3 train.py --cfg cfg/yolov3.cfg --weights weights/yolov3.weights --data data/coco2017.data --batch 64 --accum 1 --device 0,1

推理

python3 test.py --cfg cfg/yolov3.cfg --weights weights/yolov3.pt --task benchmark --augment --device 1

运行完成后会生成benchmark.txt和benchmark_yolov3.log文件，benchmark.txt文件记录了5种图片输入尺寸、2种iou阈值下的mAP@0.5...0.9和mAP@0.5值，benchmark_yolov3.log文件记录了每张图片的inference/NMS/total时间。

检测

使用detect.py文件进行测试，是yolov3模型的的实际应用，测试内容是指定一张图片，检测图片中物体，观察准确率。运行指令如下： python3 detect.py --cfg cfg/yolov3.cfg --weights weights/yolov3.pt
运行完成后会生成带有检测框的图片。

参考

GitHub - ultralytics/yolov3: YOLOv3 in PyTorch > ONNX > CoreML > TFLite

NAS-darts

本用例用于神经网络架构搜索（NAS）领域darts算法在ROCm平台PyTorch框架下的测试，包括架构搜索和架构评估两部分内容，测试流程如下。

测试流程

准备数据

以cifar-10为例进行测试说明，也可以自行下载PTB和ImageNet数据集。

预训练模型

可以选择在已有的预训练模型上进行训练，下载地址如下：
CIFAR-10
PTB
ImageNet

运行指令

架构搜索

python3 cnn/train_search.py --batch_size 100

运行结束后会在当前目录下生成./search * /log.txt文件，架构格式如下：

genotype = Genotype(normal=[('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('sep_conv_5x5', 1), ('dil_conv_5x5', 3), ('sep_conv_3x3', 1), ('sep_conv_3x3', 3)], normal_concat=range(2, 6), reduce=[('skip_connect', 0), ('skip_connect', 1), ('max_pool_3x3', 0), ('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 2)], reduce_concat=range(2, 6))

格式转化

需使用nasnet/protoc工具将上述得到的genotype转换为protobuf格式，操作如下:

cd nasnet/protoc

更改util.py中的main()函数,将架构描述填入LegacyGenotype()

def main():
    PDARTS = LegacyGenotype(normal=[('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('sep_conv_5x5', 1), ('dil_conv_5x5', 3), ('sep_conv_3x3', 1), ('sep_conv_3x3', 3)], normal_concat=range(2, 6), reduce=[('skip_connect', 0), ('skip_connect', 1), ('max_pool_3x3', 0), ('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 2)], reduce_concat=range(3, 6))
    new_PDARTS = convert_legacy_format_to_protobuf(PDARTS)
    save_genotype_to_file('pdarts.txt', new_PDARTS)

执行如下指令，生成pdarts.txt文件

python3 main.py

架构评估

运行示例

cd evaluation

./evaluate.sh {node_name} 1 0 /path/to/pdarts.txt /path/to/{save_dir}

参考

GitHub - quark0/darts: Differentiable architecture search for convolutional and recurrent networks

NLP-bert

使用PyTorch框架计算Bert网络。

BERT 的训练分为pre-train和fine-tune两种，pre-train训练分为两个phrase。
BERT 的推理可基于不同数据集进行精度验证
数据生成、模型转换相关细节见 [README.md]

运行示例

目前提供基于wiki英文数据集 pre-train 两个阶段的训练和基于squad数据集fine-tune 训练的代码示例，

pre-train phrase1

参数名	解释	示例
PATH_PHRASE1	第一阶段训练数据集路径	/workspace/lower_case_1_seq_len_128_max_pred_20_masked_lm_prob_0.15_random_seed_12345_dupe_factor_5_shard_1472_test_split_10
OUTPUT_DIR	输出路径	/workspace/results
PATH_CONFIG	confing路径	/workspace/bert_large_uncased
PATH_PHRASE2	第一阶段训练数据集路径	/workspace/lower_case_1_seq_len_512_max_pred_80_masked_lm_prob_0.15_random_seed_12345_dupe_factor_5_shard_1472_test_split_10

单卡

export HIP_VISIBLE_DEVICES=0
python3 run_pretraining_v1.py  \
    --input_dir=${PATH_PHRASE1}    \
    --output_dir=${OUTPUT_DIR}/checkpoints1 \
    --config_file=${PATH_CONFIG}bert_config.json \
    --bert_model=bert-large-uncased \
    --train_batch_size=16 \
    --max_seq_length=128 \
    --max_predictions_per_seq=20 \
    --max_steps=100000 \
    --warmup_proportion=0.0 \
    --num_steps_per_checkpoint=20000 \
    --learning_rate=4.0e-4 \
    --seed=12439 \
    --gradient_accumulation_steps=1 \
    --allreduce_post_accumulation \
    --do_train \
    --json-summary dllogger.json

多卡

方法一

export HIP_VISIBLE_DEVICES=0,1,2,3
python3 run_pretraining_v1.py  \
    --input_dir=${PATH_PHRASE1}    \
    --output_dir=${OUTPUT_DIR}/checkpoints \
    --config_file=${PATH_CONFIG}bert_config.json \
    --bert_model=bert-large-uncased \
    --train_batch_size=16 \
    --max_seq_length=128 \
    --max_predictions_per_seq=20 \
    --max_steps=100000 \
    --warmup_proportion=0.0 \
    --num_steps_per_checkpoint=20000 \
    --learning_rate=4.0e-4 \
    --seed=12439 \
    --gradient_accumulation_steps=1 \
    --allreduce_post_accumulation \
    --do_train \
    --json-summary dllogger.json

方法二

hostfile:

node1 slots=4
node2 slots=4

#scripts/run_pretrain.sh 脚本默认每个节点四块卡
cd scripts; bash run_pretrain.sh

pre-train phrase2

单卡

HIP_VISIBLE_DEVICES=0
python3 run_pretraining_v1.py
   --input_dir=${PATH_PHRASE2} \
   --output_dir=${OUTPUT_DIR}/checkpoints2 \
   --config_file=${PATH_CONFIG}bert_config.json \
   --bert_model=bert-large-uncased \
   --train_batch_size=4 \
   --max_seq_length=512 \
   --max_predictions_per_seq=80 \
   --max_steps=400000 \
   --warmup_proportion=0.128 \
   --num_steps_per_checkpoint=200000 \
   --learning_rate=4e-3 \
   --seed=12439 \
   --gradient_accumulation_steps=1 \
   --allreduce_post_accumulation \
   --do_train \
   --phase2 \
   --phase1_end_step=0 \
   --json-summary dllogger.json

多卡

方法一

export HIP_VISIBLE_DEVICES=0,1,2,3
python3 run_pretraining_v1.py
   --input_dir=${PATH_PHRASE2} \
   --output_dir=${OUTPUT_DIR}/checkpoints2 \
   --config_file=${PATH_CONFIG}bert_config.json \
   --bert_model=bert-large-uncased \
   --train_batch_size=4 \
   --max_seq_length=512 \
   --max_predictions_per_seq=80 \
   --max_steps=400000 \
   --warmup_proportion=0.128 \
   --num_steps_per_checkpoint=200000 \
   --learning_rate=4e-3 \
   --seed=12439 \
   --gradient_accumulation_steps=1 \
   --allreduce_post_accumulation \
   --do_train \
   --phase2 \
   --phase1_end_step=0 \
   --json-summary dllogger.json

方法二

hostfile:

node1 slots=4
node2 slots=4

#scripts/run_pretrain2.sh 脚本默认每个节点四块卡
cd scripts; bash run_pretrain2.sh

fine-tune 训练

单卡

python3 run_squad_v1.py \
  --train_file squad/v1.1/train-v1.1.json \
  --init_checkpoint model.ckpt-28252.pt \
  --vocab_file vocab.txt \
  --output_dir SQuAD \
  --config_file bert_config.json \
  --bert_model=bert-large-uncased \
  --do_train \
  --train_batch_size 1 \
  --gpus_per_node 1

多卡

hostfile:

node1 slots=4
node2 slots=4

#scripts/run_squad_1.sh 脚本默认每个节点四块卡
bash run_squad_1.sh

参考资料

training_results_v0.7/NVIDIA/benchmarks/bert/implementations/pytorch at master · mlperf/training_results_v0.7 · GitHub DeepLearningExamples/PyTorch/LanguageModeling/BERT at master · NVIDIA/DeepLearningExamples · GitHub

NLP-gnmt

该脚本是基于NLP领域gnmt模型的功能测试用例，参考mlperf工程，当target-bleu指标达到24.0时，视为模型达到收敛标准并成功结束作业运行。

运行

安装依赖项

pip install sacrebleu==1.2.10
pip3 install --no-cache-dir https://github.com/mlperf/logging/archive/9ea0afa.zip
apex
seq2seq中gpu相关依赖 CC=hipcc CXX=hipcc python3 setup.py install

数据集下载

bash scripts/wmt16_en_de.sh

关于数据集的更详细介绍可以参考README_orgin.md中第3部分

预处理

python3 preprocess_data.py --dataset-dir /path/to/download/wmt16_de_en/ --preproc-data-dir =/path/to/save/preprocess/data --max-length-train "75" --math fp32

单机单卡

HIP_VISIBLE_DEVICES=0 python3 train.py \
    --save ${RESULTS_DIR} \
    --dataset-dir ${DATASET_DIR} \
    --preproc-data-dir ${PREPROC_DATADIR}/${MAX_SEQ_LEN} \
    --target-bleu $TARGET \
    --epochs "${NUMEPOCHS}" \
    --math ${MATH} \
    --max-length-train ${MAX_SEQ_LEN} \
    --print-freq 10 \
    --train-batch-size $TRAIN_BATCH_SIZE \
    --test-batch-size $TEST_BATCH_SIZE \
    --optimizer Adam \
    --lr $LR \
    --warmup-steps $WARMUP_STEPS \
    --remain-steps $REMAIN_STEPS \
    --decay-interval $DECAY_INTERVAL \
    --no-log-all-ranks

可参考run_fp32_singleCard.sh

单机多卡

bash run_fp32_node.sh

可参考run_fp32_node.sh

多机多卡

bash run_fp32_multi.sh

模型

Publication/Attribution

Implemented model is similar to the one from Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation paper.

Most important difference is in the attention mechanism. This repository implements gnmt_v2 attention: output from first LSTM layer of decoder goes into attention, then re-weighted context is concatenated with inputs to all subsequent LSTM layers in decoder at current timestep.

The same attention mechanism is also implemented in default GNMT-like models from tensorflow/nmt and NVIDIA/OpenSeq2Seq.

Structure

general:
- encoder and decoder are using shared embeddings
- data-parallel multi-gpu training
- trained with label smoothing loss (smoothing factor 0.1)
encoder:
- 4-layer LSTM, hidden size 1024, first layer is bidirectional, the rest of layers are unidirectional
- with residual connections starting from 3rd LSTM layer
- uses standard pytorch nn.LSTM layer
- dropout is applied on input to all LSTM layers, probability of dropout is set to 0.2
- hidden state of LSTM layers is initialized with zeros
- weights and bias of LSTM layers is initialized with uniform(-0.1, 0.1) distribution
decoder:
- 4-layer unidirectional LSTM with hidden size 1024 and fully-connected classifier
- with residual connections starting from 3rd LSTM layer
- uses standard pytorch nn.LSTM layer
- dropout is applied on input to all LSTM layers, probability of dropout is set to 0.2
- hidden state of LSTM layers is initialized with zeros
- weights and bias of LSTM layers is initialized with uniform(-0.1, 0.1) distribution
- weights and bias of fully-connected classifier is initialized with uniform(-0.1, 0.1) distribution
attention:
- normalized Bahdanau attention
- model uses gnmt_v2 attention mechanism
- output from first LSTM layer of decoder goes into attention, then re-weighted context is concatenated with the input to all subsequent LSTM layers in decoder at the current timestep
- linear transform of keys and queries is initialized with uniform(-0.1, 0.1), normalization scalar is initialized with 1.0 / sqrt(1024), normalization bias is initialized with zero
inference:
- beam search with beam size of 5
- with coverage penalty and length normalization, coverage penalty factor is set to 0.1, length normalization factor is set to 0.6 and length normalization constant is set to 5.0
- BLEU computed by sacrebleu

Implementation:

base Seq2Seq model: pytorch/seq2seq/models/seq2seq_base.py, class Seq2Seq
GNMT model: pytorch/seq2seq/models/gnmt.py, class GNMT
encoder: pytorch/seq2seq/models/encoder.py, class ResidualRecurrentEncoder
decoder: pytorch/seq2seq/models/decoder.py, class ResidualRecurrentDecoder
attention: pytorch/seq2seq/models/attention.py, class BahdanauAttention
inference (including BLEU evaluation and detokenization): pytorch/seq2seq/inference/inference.py, class Translator
beam search: pytorch/seq2seq/inference/beam_search.py, class SequenceGenerator

Loss function

Cross entropy loss with label smoothing (smoothing factor = 0.1), padding is not considered part of the loss.

Loss function is implemented in pytorch/seq2seq/train/smoothing.py, class LabelSmoothing.

Optimizer

Adam optimizer with learning rate 1e-3, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8 and no weight decay. Network is trained with gradient clipping, max L2 norm of gradients is set to 5.0.

Optimizer is implemented in pytorch/seq2seq/train/fp_optimizers.py, class Fp32Optimizer.

Learning rate schedule

Model is trained with exponential learning rate warmup for 200 steps and with step learning rate decay. Decay is started after 2/3 of training steps, decays for a total of 4 times, at regularly spaced intervals, decay factor is 0.5.

Learning rate scheduler is implemented in pytorch/seq2seq/train/lr_scheduler.py, class WarmupMultiStepLR.

评估

Quality metric

Uncased BLEU score on newstest2014 en-de dataset. BLEU scores reported by sacrebleu package (version 1.2.10). Sacrebleu is executed with the following flags: --score-only -lc --tokenize intl.

Quality target

Uncased BLEU score of 24.00.

Evaluation frequency

Evaluation of BLEU score is done after every epoch.

Evaluation thoroughness

Evaluation uses all of newstest2014.en (3003 sentences).

Recommendation

本用例用于推荐领域NCF模型在ROCm平台PyTorch框架下的性能测试，已在rocm3.3 pytorch1.5版本下进行验证，测试流程如下。

测试流程

数据处理

数据集下载地址
MovieLens | GroupLens
数据转换格式
ml-1m

python3 convert.py --path /path/to/{ml-1m_dir}/ratings.dat --output dataset/ml-1m

ml-20m

python3 convert.py --path /path/to/{ml-20m_dir}/ratings.csv --output dataset/ml-20m

运行指令

python3 -m torch.distributed.launch --nproc_per_node=<number_of_gpus> --use_env ncf.py --data <path_to_dataset> [other_parameters]

单卡示例

python3 -m torch.distributed.launch --nproc_per_node=1 --use_env ncf.py --data=./dataset/ml-1m --checkpoint_dir=/path/to/{check_save_dir}

4卡示例

python3 -m torch.distributed.launch --nproc_per_node=4 --use_env ncf.py --data=./dataset/ml-1m --checkpoint_dir=/path/to/{check_save_dir}

参考

DeepLearningExamples/PyTorch/Recommendation/NCF at 92829376a126286932496ff10d7cc655cb79af05 · NVIDIA/DeepLearningExamples · GitHub

技术瘾君子1573

关注

9
点赞
踩
8

收藏

觉得还不错? 一键收藏
打赏
0
评论
深度学习计算框架PyTorch训练不同网络方法&示例

深度学习计算框架PyTorch训练不同网络方法&示例
复制链接

扫一扫

专栏目录