模型中加入SummaryCollector，训练模型报错

xiao | yang

已于 2022-11-08 09:40:24 修改

阅读量343

点赞数

分类专栏：深度学习文章标签： python 深度学习人工智能

于 2022-11-08 09:40:19 首次发布

本文链接：https://blog.csdn.net/xi_xiyu/article/details/127744521

版权

使用mindinsight，训练出错

参考的是官方的教程，已经成功安装mindinsight，并且能能成功访问insight主页。但是在训练代码中，加入SummaryCollector后，训练在第一个epoech的第一个step就出错。在指定的summary目录下，没有生成日志文件。感觉可能是入SummaryCollector阶段就出错了。

使用的模型

shufflenetv1，源代码链接： models: Models of MindSpore - Gitee.com

代码改动

train.py

# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""train ShuffleNetV1"""
import os
import time
from mindspore import SummaryCollector
from mindspore import context, nn
from mindspore import Tensor
from mindspore.common import set_seed
from mindspore.nn.optim.momentum import Momentum
from mindspore.context import ParallelMode
from mindspore.train.model import Model
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, TimeMonitor, LossMonitor
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from mindspore.communication.management import init, get_rank, get_group_size
from mindspore.train.loss_scale_manager import FixedLossScaleManager
from mindvision.engine.callback import ValAccMonitor
from src.lr_generator import get_lr
from src.shufflenetv1 import ShuffleNetV1
from src.dataset import create_dataset, create_flower_dataset
from src.crossentropysmooth import CrossEntropySmooth
from src.model_utils.config import config
from src.model_utils.moxing_adapter import moxing_wrapper
from src.model_utils.device_adapter import get_device_id

set_seed(1)


def modelarts_pre_process():
    pass


@moxing_wrapper(pre_process=modelarts_pre_process)
def train():
    context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target, save_graphs=False)

    # init distributed
    if config.is_distributed:
        if os.getenv('DEVICE_ID', "not_set").isdigit():
            context.set_context(device_id=get_device_id())
        init()
        rank = get_rank()
        group_size = get_group_size()
        parallel_mode = ParallelMode.DATA_PARALLEL
        context.set_auto_parallel_context(parallel_mode=parallel_mode, device_num=group_size, gradients_mean=True)
    else:
        rank = 0
        group_size = 1
        context.set_context(device_id=config.device_id)

    if config.device_target == "GPU":
        context.set_context(enable_graph_kernel=True)
    # define network
    net = ShuffleNetV1(model_size=config.model_size, n_class=config.num_classes)

    # define loss
    loss = CrossEntropySmooth(sparse=True, reduction="mean", smooth_factor=config.label_sm

最低0.47元/天解锁文章

xiao | yang

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
模型中加入SummaryCollector，训练模型报错

参考的是官方的教程，已经成功安装mindinsight，并且能能成功访问insight主页。但是在训练代码中，加入SummaryCollector后，训练在第一个epoech的第一个step就出错。在指定的summary目录下，没有生成日志文件。python 3.75 conda 10.1 mindspore 1.8 mindinsight 1.8 和 1.7均尝试过。可以判断在记录dataset_graph时发生错误，如果不需要记录数据处理流程，可以在。中设置为False。
复制链接

扫一扫