【MindSpore系列总结】基础操作 2

最新推荐文章于 2023-05-10 16:24:49 发布

skytier

最新推荐文章于 2023-05-10 16:24:49 发布

阅读量472

点赞数

文章标签：人工智能 python 开发语言 Powered by 金山文档

原文链接：https://www.hiascend.com/forum/thread-0232107511930160127-1-1.html

版权

参考文档：

数据变换 Transforms

网络构建

Compose

Compose：把多个数据处理操作封装成一个操作接口。compose里的多个操作流程中，上一个操作的输出作为下一个操作的输入。

图像操作

Rescale

Rescale：对输入图像数据的每一个像素点的值进行缩放和平移，需要输入缩放比例和平移尺寸等参数。

Normalize

Normalize：对输入图像的归一化，包括三个参数：

mean：每个通道下，图像的均值。

std：每个通道下，图像的标准差。

is_hwc：输入图像格式，是(height, width, channel)还是(channel, height, width)。

HWC2CWH

HWC2CWH：将(height, width, channel)格式存储的数据转换成(channel, height, width)格式。

在不同的硬件设备中可能会对(height, width, channel)或(channel, height, width)两种不同格式有针对性优化。MindSpore设置HWC为默认图像格式，在有CWH格式需求时，可使用该变换进行处理。

文本操作

BasicTokenizer

BasicTokenizer：用于将文本进行分词，效果为：

[‘我喜欢China!’] -> [‘我’, ‘喜’, ‘欢’, ‘China’, ‘!’]

这个接口我在windows上跑着报错：AttributeError: module ‘mindspore.dataset.text’ has no attribute ‘BasicTokenizer’

然后看了接口介绍，emmm，好吧，BasicTokenizer接口不支持windows平台

Vocab

Vocab：（vocabulary 词汇）给每一个文本（单词）增加一个序号，从而生成一个词表。

该操作通常和Lookup操作配合使用：

1、先将文本分成词语（分词，Tokenizer）

2、再生成一个词表，每一个序号独立对应一个词（生成词表，Vocab）

3、最后使用Lookup将一段文本转换成一串数字序号（转换，Lookup），后续直接处理数字更方便。

Lookup

Lookup：按照指定词表，将数据集转换成数字。需要传入一个词表。

代码示例

由于MindSpore在windows下不支持分词操作，以下代码当前未能跑起来，但是思路应该是正确的。

'''
text process
'''from mindspore.dataset import GeneratorDataset, text

texts = ['this is text', '这是文本', '333333']
my_dataset = GeneratorDataset(source=texts, column_names=['text'])

for item in my_dataset:
    print(item)

print("do tokenizer...")

# Windows下不支持BasicTokenizer接口分词
my_dataset = my_dataset.map(text.BasicTokenizer())

# 生成词表
vocab = text.Vocab.from_dataset(my_dataset)
print(type(vocab.vocab()))
print(vocab.vocab())

# 使用词表转换数据
my_dataset = my_dataset.map(text.Lookup(vocab))
print(next(my_dataset.create_dict_iterator()))

搭建一个神经网络模型

mindspore下搭建神经网络，我们需要继承mindspore.nn.cell类（类似pytorch的torch.nn.Module），然后在__init__()中进行模型初始化操作，并在__construct__(self, item)中实现模型的操作（类似pytorch的forward）。

在MindSpore中，Cell类是构建所有网络的基类，也是网络的基本单元。construct的意思是神经网络（计算图）的构建。

一个简单的神经网络模型构建如下：

import mindspore.nn as nn
classMyModuleTestSequential(nn.Cell):def__init__(self, auto_prefix=True, flags=None):
        super().__init__(auto_prefix, flags)
        self.my_net = nn.SequentialCell(
            nn.Dense(in_channels=28*28, out_channels=512),
            nn.ReLU(),
            nn.Dense(in_channels=512, out_channels=10),
        )

    defconstruct(self, input_val):return self.my_net(input_val)

在__init__()中，首先调用了父类（此处是nn.Cell）的__init__()方法（super().init(auto_prefix, flags)），从而继承父类的各个属性；然后开始构造自己的网络模型，此处进行了全连接+ReLU激活+全连接的操作，使用nn.SequentialCell方法将三层组合，最后输出通道为10路。构建完成后，即可实例化该模型，打印其结构：

net = MyModuleTestSequential()
print(net)

输出：

MyModuleTestSequential<
  (my_net): SequentialCell<
    (0): Dense<input_channels=784, output_channels=512, has_bias=True>
    (1): ReLU<>
    (2): Dense<input_channels=512, output_channels=10, has_bias=True>
    >
  >

输入测试数据

创建一组数据，输入进去，查看输出：

import mindspore as ms
import mindspore.ops as ops
...
input_test = ops.ones(shape=(1, 28*28), type=ms.float32)
output_test = net(input_test)
print(output_test)

[[-0.02820602 -0.03413845 -0.02405711 -0.007759910.04736076 -0.038818370.037597   -0.038541230.06578549 -0.04432071]]

求这些值的softmax分布（softmax介绍：一文详解Softmax函数）：

import mindspore.nn as nn
...
pred_probab = nn.Softmax(axis=1)(output_test)
print(pred_probab, pred_probab.shape)

# 返回最大值对应序号
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

[[0.106164750.099844120.098593210.093148520.102407010.10130290.099276670.098814540.101411660.09903661]] (1, 10)
Predicted class: [0]

于是，针对一组全是1的输入，我们得到了一组输出，并计算了各个值的概率最大值，得到基于最大值的输出预测。

查看模型里的参数

# 查看模型里的参数
print(f"Model structure: {net}\n\n")
for name, param in net.parameters_and_names():
    print(f"Layer: {name}\nSize: {param.shape}\nValues : {param[:2]} \n")

Model structure: MyModuleTestSequential<
  (my_net): SequentialCell<
    (0): Dense<input_channels=784, output_channels=512, has_bias=True>
    (1): ReLU<>
    (2): Dense<input_channels=512, output_channels=10, has_bias=True>
    >
  >


Layer: my_net.0.weight
Size: (512, 784)
Values : [[-0.003456810.008741320.00787816 ... -0.004777290.01524485  -0.00455558]
 		  [-0.021926850.006976820.01392365 ... -0.015387420.00704653  -0.00263295]]

Layer: my_net.0.bias
Size: (512,)
Values : [0.0.]

Layer: my_net.2.weight
Size: (10, 512)
Values : [[ 0.009027030.009646940.00669623 ...  0.007639690.02234990.01275787]
 		  [ 0.012368030.000125430.00288158 ...  0.00165502 -0.016913480.01083482]]

Layer: my_net.2.bias
Size: (10,)
Values : [0.0.]

从输出中，可以看到两个全连接层的weight参数和bias参数的值。在模型训练过程中，这些参数会不断变化，我们可以随时使用这个接口查看参数的名称和内容。

skytier

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【MindSpore系列总结】基础操作 2

在__init__()中，首先调用了父类（此处是nn.Cell）的__init__()方法（super().init(auto_prefix, flags)），从而继承父类的各个属性；mindspore下搭建神经网络，我们需要继承mindspore.nn.cell类（类似pytorch的torch.nn.Module），然后在__init__()中进行模型初始化操作，并在__construct__(self, item)中实现模型的操作（类似pytorch的forward）。[‘我喜欢China!
复制链接

扫一扫