本章内容是PyTorch与TN_tutorial 的 Library 搭建FCNN训练MNIST,经典量子混合学习做准备
原教程参照:
张量网络PyThon编程:3.3 神经网络模块化编程(a)_哔哩哔哩_bilibili
本文仅对要点进行整理,大量个人尝试&补充。最终解释权归首都师范大学的冉仕举老师所有:
StringCNU的个人空间-StringCNU个人主页-哔哩哔哩视频
* 何为神经网络:
(1)一种多段多维的非线性特征提取器
(2)特点:神经网络的自由度更高,可以更加灵活对细节进行拟合;特征是什么?我们不再过 多抽象:而是把最原始数据的差异流经神经网络而进行抽取
(3)结构:神经元 - 激活函数-权重与偏置:
* 感知机:提出个最朴素的全连接层,储存特征值n维向量,拿一个+- 1的超平面分割它
分别训练一个不带sigmoid()和带sigmoid()的,激活函数的重要性就体现出来了
import torch
import numpy as np
import matplotlib.pyplot as plt
import torch.nn as nn
# 生成随机数据
np.random.seed(1)
x_train = np.concatenate((np.random.randn(100, 2) * 0.8 + np.array([3, 3]),
np.random.randn(100, 2) * 0.5 + np.array([-3, -3])))
y_train = np.concatenate((np.zeros((100, 1)), np.ones((100, 1))), axis=0)
# 将数据转换为Tensor
x_train = torch.from_numpy(x_train).type(torch.FloatTensor)
y_train = torch.from_numpy(y_train).type(torch.FloatTensor)
# 绘制数据
fig, ax = plt.subplots(figsize=(6,6))
ax.scatter(x_train[:,0].numpy(), x_train[:,1].numpy(), c=y_train[:,0].numpy())
ax.set_title('Scatter plot')
ax.set_xlabel('X')
ax.set_ylabel('Y')
plt.show()
# 定义无sigmoid激活层的感知机模型
class Perceptron(nn.Module):
def __init__(self):
super(Perceptron, self).__init__()
self.linear = nn.Linear(2, 1, bias=True)
def forward(self, x):
out = self.linear(x)
return out
# 定义有sigmoid激活层的感知机模型
class PerceptronSigmoid(nn.Module):
def __init__(self):
super(PerceptronSigmoid, self).__init__()
self.linear = nn.Linear(2, 1, bias=True)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
out = self.linear(x)
out = self.sigmoid(out)
return out
# 定义模型、损失函数和优化器
model = Perceptron()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# 训练模型
num_epochs = 100
for epoch in range(num_epochs):
# 前向传播
outputs = model(x_train)
loss = criterion(outputs, y_train)
# 后向传播与优化
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 10 == 0:
print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
# 绘制决策边界
w = model.linear.weight.detach().numpy()
b = model.linear.bias.detach().numpy()
fig, ax = plt.subplots(figsize=(6,6))
ax.scatter(x_train[:,0].numpy(), x_train[:,1].numpy(), c=y_train[:,0].numpy())
ax.set_title('Scatter plot')
ax.set_xlabel('X')
ax.set_ylabel('Y')
x_hyperplane = np.linspace(-6,6,10)
y_hyperplane = -(w[0][0]*x_hyperplane+b[0])/w[0][1]
ax.plot(x_hyperplane, y_hyperplane)
plt.show()
# 定义模型、损失函数和优化器
model_sigmoid = PerceptronSigmoid()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model_sigmoid.parameters(), lr=0.01)
# 训练模型
num_epochs = 100
for epoch in range(num_epochs):
# 前向传播
outputs = model_sigmoid(x_train)
loss = criterion(outputs, y_train)
# 后向传播与优化
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 10 == 0:
print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
# 绘制决策边界
w = model_sigmoid.linear.weight.detach().numpy()
b = model_sigmoid.linear.bias.detach().numpy()
fig, ax = plt.subplots(figsize=(6,6))
ax.scatter(x_train[:,0].numpy(), x_train[:,1].numpy(), c=y_train[:,0].numpy())
ax.set_title('Scatter plot')
ax.set_xlabel('X')
ax.set_ylabel('Y')
x_hyperplane = np.linspace(-6,6,10)
y_hyperplane = -(w[0][0]*x_hyperplane+b[0])/w[0][1]
ax.plot(x_hyperplane, y_hyperplane)
plt.show()
点集的二分类:不带sigmoid,带sigmoid的:
无Sigmoid的训练结果:
Epoch [10/100], Loss: 0.2642
Epoch [20/100], Loss: 0.2306
Epoch [30/100], Loss: 0.2155
Epoch [40/100], Loss: 0.2065
Epoch [50/100], Loss: 0.2005
Epoch [60/100], Loss: 0.1963
Epoch [70/100], Loss: 0.1932
Epoch [80/100], Loss: 0.1908
Epoch [90/100], Loss: 0.1888
Epoch [100/100], Loss: 0.1871
含Sigmoid的训练结果:
Epoch [10/100], Loss: 0.2645
Epoch [20/100], Loss: 0.2272
Epoch [30/100], Loss: 0.2100
Epoch [40/100], Loss: 0.1978
Epoch [50/100], Loss: 0.1886
Epoch [60/100], Loss: 0.1815
Epoch [70/100], Loss: 0.1760
Epoch [80/100], Loss: 0.1717
Epoch [90/100], Loss: 0.1683
Epoch [100/100], Loss: 0.1657
* 多层神经网络:(可以类比下决策树)之后我们搭一个layer_num = 2的Toy分类模型
(1)输入层:输入向量,特征维数
(2)输出层:分类数 or 标签数
(3)隐藏层:Robustness-数目不定,越多神经元,越多层,效果越好。
如图示是最简单的FCNN(全连接模型),
为了获得正确的分类模型,需要拿损失函数衡量-优化来达成。
(4)向前传播(Forward):拿当前权重获得损失函数
(5)反向传播(Backward):对损失函数梯度优化,更新权重
* 再讲激活函数:
感知机仅做超平面分割,拐点和增强模型Robustness,交由非线性激活
* 建立损失函数:
我们要搭的是一个是图像分类的softmax映射,考虑到分类问题,首选nn.CrossEntropyLoss( )
import torch
import torch.nn as nn
# 交叉熵损失(Cross-Entropy Loss):
# 适用于二分类或多分类问题,通过计算预测值与真实值之间的交叉熵来衡量模型的性能
cross_entropy_loss = nn.CrossEntropyLoss()
output = torch.randn(10, 5) # 预测值,大小为(batch_size, catagories)
target = torch.randint(5, (10,)) # 真实标签,大小为(batch_size,)
loss = cross_entropy_loss(output, target)
# 以下是简单源码尝试,真实训练还是要用nn.CrossEntropyLoss()的
import numpy as np
def cross_entropy(predictions, targets):
# Clip predictions to avoid log(0) errors
epsilon = 1e-12
predictions = np.clip(predictions, epsilon, 1. - epsilon)
# Calculate cross-entropy loss
N = predictions.shape[0]
ce_loss = -np.sum(targets * np.log(predictions)) / N
return ce_loss
# 最后还是贴出一个数学公式的表达:
* 举个例子一个利用到CrossEntropyLoss的layer = 2,简明完整的分类网络:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
# 定义训练数据集与测试数据集
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
train_dataset = datasets.MNIST('../data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('../data', train=False, transform=transform)
# 定义数据加载器,用于批次化处理数据
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)
# 定义神经网络模型
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 128) # 输入层到隐含层
self.fc2 = nn.Linear(128, 10) # 隐含层到输出层
def forward(self, x):
x = x.view(-1, 784) # 展平图像数据
x = torch.relu(self.fc1(x)) # 隐含层使用ReLU激活函数
x = self.fc2(x) # 输出层不使用激活函数
return x
# 实例化模型
model = Net()
# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
# 训练模型
for epoch in range(10):
for i, (data, target) in enumerate(train_loader):
optimizer.zero_grad() # 梯度归零
output = model(data) # 前向传播
loss = criterion(output, target) # 计算损失
loss.backward() # 反向传播
optimizer.step() # 更新参数
if (i+1) % 100 == 0:
print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, 10, i+1, len(train_loader), loss.item()))
# 测试模型
model.eval()
with torch.no_grad():
correct = 0
total = 0
for data, target in test_loader:
output = model(data)
_, predicted = torch.max(output.data, 1)
total += target.size(0)
correct += (predicted == target).sum().item()
print('Accuracy of the model on the test images: {} %'.format(100 * correct / total))
* 总之基本的结构:定义网络,读取数据,输入数据,向前传播反向传播,获得训练结果
* 下一讲会拿这个去看ranshiju搭建的框架