Four steps are the common training process.
First, you need to construct the dataset.
Next, you define the neural network.
Third, you define the hyperparameter\ loss function and the optimizer.
Finally you train and test.
We load the data.
%%capture captured_output
# 实验环境已经预装了mindspore==2.2.14,如需更换mindspore版本,可更改下面mindspore的版本号
!pip uninstall mindspore -y
!pip install -i https://pypi.mirrors.ustc.edu.cn/simple mindspore==2.2.14
import mindspore
from mindspore import nn
from mindspore.dataset import vision, transforms
from mindspore.dataset import MnistDataset
# Download data from open datasets
from download import download
url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/" \
"notebook/datasets/MNIST_Data.zip"
path = download(url, "./", kind="zip", replace=True)
def datapipe(path, batch_size):
image_transforms = [
vision.Rescale(1.0 / 255.0, 0),
vision.Normalize(mean=(0.1307,), std=(0.3081,)),
vision.HWC2CHW()
]
label_transform = transforms.TypeCast(mindspore.int32)
dataset = MnistDataset(path)
dataset = dataset.map(image_transforms, 'image')
dataset = dataset.map(label_transform, 'label')
dataset = dataset.batch(batch_size)
return dataset
train_dataset = datapipe('MNIST_Data/train', batch_size=64)
test_dataset = datapipe('MNIST_Data/test', batch_size=64)
the code above has nothing to say but the rescale and normalization, which are important to process data. Next, we just define the neural network, which I depicit in last blog.
class Network(nn.Cell):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten()
self.dense_relu_sequential = nn.SequentialCell(
nn.Dense(28*28, 512),
nn.ReLU(),
nn.Dense(512, 512),
nn.ReLU(),
nn.Dense(512, 10)
)
def construct(self, x):
x = self.flatten(x)
logits = self.dense_relu_sequential(x)
return logits
model = Network()
here are the work we need. Hyperparameters are just what we should choose manually to estimate whether the model is better or worse by test it again and again.
epochs = 3
batch_size = 64
learning_rate = 1e-2
epoch: How many times do you want to train your model ?
batch_size: How big is the data you pass to model once?
learning rate: How fast you want your model to learn the output? Too fast may just get a worse result.
loss_fn = nn.CrossEntropyLoss()
this is a kind of loss function, you can view it as the sum of log(your raw loss), which is better for calculation.
optimizer = nn.SGD(model.trainable_params(), learning_rate=learning_rate)
the code above is just a kind of optimizer. Some tricks to make your training more accurate with less calculation. SGD just means choose some samples and calculate the gradiants to represent the whole data.
here is the training process.
# define a forward function to predict
def forward_fn(data, label):
logits = model(data)
loss = loss_fn(logits, label)
return loss, logits
# get gradient
grad_fn = mindspore.value_and_grad(forward_fn, None, optimizer.parameters, has_aux = True)
# define how to train in each step
def train_step(data, label):
(loss, _), grads = grad_fn(data, label)
optimizer(grads)
return loss
# train again and again until all data you chosen have been seen by the model
def train_loop(model, dataset):
size = dataset.get_dataset_size()
model.set_train()
for batch, (data,label) in enumerate(dataset.create_tuple_iterator()):
loss = train_step(data, label)
if batch % 100 == 0:
loss, current = loss.asnumpy(), batch
print(f"loss:{loss:>7f} [{current:>3d}/{size:>3d}]")
#finally test_loop are similiar
def test_loop(model, dataset, loss_fn):
num_batches = dataset.get_dataset_size()
model.set_train(False)
total, test_loss, correct = 0,0,0
for data, label in dataset.create_tuple_iterator():
pred = model(data)
total += len(data)
test_loss += loss_fn(pred, label).asnumpy() #calculate loss is in loss_fn and we sum it
correct += (pred.argmax(1) == label).asnumpy().sum() #here we just sum those correct predictions numbers
test_loss /= num_batches #average loss for each batch
correct /= total
print(f"Test: \n Accuracy:{(100*correct):>0.1f}%, Avg loss: {test_loss:>8f}\n")
What does the code above do?
model predict, calculate the gradient and loss, train, test. what we can revise is those hyperparameters. We just change the temperature to cook a good soup. But we do not know whether the soup taste better if we cook with more salt/ less oil? Just try !