神经网络应用于回归问题
优势是什么???
神经网络是处理回归问题的强大工具,它们能够学习输入数据和输出之间的复杂关系。
神经网络提供了一种灵活且强大的框架,用于建模和预测回归问题。通过 适当的 网络结构、训练策略和正则化技术,可以有效地从数据中学习并做出准确的预测。
在实际应用中,选择合适的网络架构和参数对于构建一个高效的回归模型至关重要。
所以说,虽然神经网络是处理回归问题的强大工具,但是也存在很多问题,需要我们掌握很多方法技巧才能建立一个高效准确的回归模型:(实际上,掌握这些技巧的使用方法其实仍然无法把模型训练的很好,构建一个好的模型最重要的其实是对这个问题本质的理解)
- 正则化(Regularization): 为了防止过拟合,可以在损失函数中添加正则化项,如L1或L2正则化。
- Dropout: 这是一种技术,可以在训练过程中随机地丢弃一些神经元的激活,以减少模型对特定神经元的依赖。
- 批量归一化(Batch Normalization): 通过对每一层的输入进行归一化处理,可以加速训练过程并提高模型的稳定性。
- 早停(Early Stopping): 当验证集上的性能不再提升时,停止训练以避免过拟合。
- 超参数调整(Hyperparameter Tuning): 通过调整网络结构(如层数、每层的神经元数量)和学习率等超参数,可以优化模型的性能。
生成数据集:
输入数据:
X
1
=
100
×
N
(
1
,
1
)
X_{1} = 100 \times \mathcal{N}(1, 1)
X1=100×N(1,1)
X
2
=
N
(
1
,
1
)
10
X_{2} = \frac{\mathcal{N}(1, 1) }{10}
X2=10N(1,1)
X
3
=
10000
×
N
(
1
,
1
)
X_{3} = 10000 \times \mathcal{N}(1, 1)
X3=10000×N(1,1)
输出数据
Y
Y
Y和
Y
1
Y_1
Y1:
Y
=
6
X
1
−
3
X
2
+
X
3
2
+
ϵ
Y = 6X_{1} - 3X_2 + X_3^2 + \epsilon
Y=6X1−3X2+X32+ϵ
Y
1
=
X
1
⋅
X
2
−
X
1
X
3
+
X
3
X
2
+
ϵ
1
Y_1 = X_1 \cdot X_2 - \frac{X_1}{X_3} + \frac{X_3}{X_2} + \epsilon_1
Y1=X1⋅X2−X3X1+X2X3+ϵ1
其中,
ϵ
1
\epsilon_1
ϵ1 是均值为0,方差为0.1的正态分布噪声。
请注意,这里的 N ( μ , σ 2 ) {N}(\mu, \sigma^2) N(μ,σ2) 表示均值为 μ \mu μ ,方差为 σ 2 \sigma^2 σ2的正态分布。
下面是生成数据集的代码:
# 生成测试数据
import numpy as np
import pandas as pd
# 训练集和验证集样本总个数
sample = 2000
train_data_path = 'train.csv'
validate_data_path = 'validate.csv'
predict_data_path = 'test.csv'
# 构造生成数据的模型
X1 = np.zeros((sample, 1))
X1[:, 0] = np.random.normal(1, 1, sample) * 100
X2 = np.zeros((sample, 1))
X2[:, 0] = np.random.normal(2, 1, sample) / 10
X3 = np.zeros((sample, 1))
X3[:, 0] = np.random.normal(3, 1, sample) * 10000
# 模型
Y = 6 * X1 - 3 * X2 + X3 * X3 + np.random.normal(0, 0.1, [sample, 1])
Y1 = X1 * X2 - X1 / X3 + X3 / X2 + np.random.normal(0, 0.1, [sample, 1])
# 将所有生成的数据放到data里面
data = np.zeros((sample, 5))
data[:, 0] = X1[:, 0]
data[:, 1] = X2[:, 0]
data[:, 2] = X3[:, 0]
data[:, 3] = Y[:, 0]
data[:, 4] = Y1[:, 0]
# 将data分成测试集和训练集
num_traindata = int(0.8*sample)
# 将训练数据保存
traindata = pd.DataFrame(data[0:num_traindata, :], columns=['x1', 'x2', 'x3', 'y', 'y1'])
traindata.to_csv(train_data_path, index=False)
print('训练数据保存在: ', train_data_path)
# 将验证数据保存
validate_data = pd.DataFrame(data[num_traindata:, :], columns=['x1', 'x2', 'x3', 'y', 'y1'])
validate_data.to_csv(validate_data_path, index=False)
print('验证数据保存在: ', validate_data_path)
# 将预测数据保存
predict_data = pd.DataFrame(data[num_traindata:, 0:-2], columns=['x1', 'x2', 'x3'])
predict_data.to_csv(predict_data_path, index=False)
print('预测数据保存在: ', predict_data_path)
通用神经网络拟合函数
要根据生成的数据集建立回归模型应该如何实现呢?对于这样包含非线性的方程,直接应用通用的神经网络模型可能效果并不好,就像这样:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
class FNN(nn.Module):
def __init__(self,Arc,func,device):
super(FNN, self).__init__() # 调用父类的构造函数
self.func = func # 定义激活函数
self.Arc = Arc # 定义网络架构
self.device = device
self.model = self.create_model().to(self.device)
# print(self.model)
def create_model(self):
layers = []
for ii in range(len(self.Arc) - 2): # 遍历除最后一层外的所有层
layers.append(nn.Linear(self.Arc[ii], self.Arc[ii + 1], bias=True))
layers.append(self.func) # 添加激活函数
if ii < len(self.Arc) - 3: # 如果不是倒数第二层,添加 Dropout 层
layers.append(nn.Dropout(p=0.1))
layers.append(nn.Linear(self.Arc[-2], self.Arc[-1], bias=True)) # 添加最后一层
return nn.Sequential(*layers)
def forward(self,x):
out = self.model(x)
return out
if __name__ == "__main__":
# 定义网络架构和激活函数
Arc = [3, 10, 20, 20, 20, 10, 2]
func = nn.ReLU() # 选择ReLU激活函数
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # 根据是否有GPU来选择设备
# 创建FNN模型实例
model = FNN(Arc, func, device)
# 定义损失函数和优化器
criterion = nn.MSELoss() # 均方误差损失函数
optimizer = optim.Adam(model.parameters(), lr=0.001) # 使用Adam优化器
# 训练数据
train_data_path = 'train.csv'
train_data = pd.read_csv(train_data_path)
features = np.array(train_data.iloc[:, :-2])
labels = np.array(train_data.iloc[:, -2:])
#转换成张量
inputs_tensor = torch.from_numpy(features).float().to(device) # 转换为浮点张量
labels_tensor = torch.from_numpy(labels).float().to(device) # 如果标签是数值型数
loss_history = []
# 训练模型
for epoch in range(20000):
optimizer.zero_grad() # 清空之前的梯度
outputs = model(inputs_tensor) # 前向传播
loss = criterion(outputs, labels_tensor) # 计算损失
loss_history.append(loss.item()) # 将损失值保存在列表中
loss.backward() # 反向传播
optimizer.step() # 更新权重
if epoch % 1000 == 0:
print('epoch is', epoch, 'loss is', loss.item(), )
import matplotlib.pyplot as plt
loss_history = np.array(loss_history)
plt.plot(loss_history)
plt.xlabel = ('epoch')
plt.ylabel = ('loss')
plt.show()
torch.save(model, 'model\entire_model.pth')
应用这个代码得到的损失随迭代次数变化曲线如图:
这损失值也太大了!!!
那么应该如何修改神经网络模型使其损失函数降低呢?
修改神经网络:考虑归一化和早停
早停方法是跟这位大佬学习的:
早停方法
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from pytorchtools import EarlyStopping
class FNN(nn.Module):
def __init__(self,Arc,func,device):
super(FNN, self).__init__() # 调用父类的构造函数
self.func = func # 定义激活函数
self.Arc = Arc # 定义网络架构
self.device = device
self.model = self.create_model().to(self.device)
# print(self.model)
def create_model(self):
layers = []
for ii in range(len(self.Arc) - 2): # 遍历除最后一层外的所有层
layers.append(nn.Linear(self.Arc[ii], self.Arc[ii + 1], bias=True))
layers.append(self.func) # 添加激活函数
if ii < len(self.Arc) - 3: # 如果不是倒数第二层,添加 Dropout 层
layers.append(nn.Dropout(p=0.1))
layers.append(nn.Linear(self.Arc[-2], self.Arc[-1], bias=True)) # 添加最后一层
return nn.Sequential(*layers)
def forward(self,x):
out = self.model(x)
return out
if __name__ == "__main__":
# 定义网络架构和激活函数
Arc = [3, 10, 20, 20, 20, 20, 20, 20, 10, 2]
func = nn.ReLU() # 选择ReLU激活函数
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # 根据是否有GPU来选择设备
# 创建FNN模型实例
model = FNN(Arc, func, device)
# 定义损失函数和优化器
criterion = nn.MSELoss() # 均方误差损失函数
optimizer = optim.Adam(model.parameters(), lr=1e-4) # 使用Adam优化器
# 训练数据
train_data_path = 'train.csv'
train_data = pd.read_csv(train_data_path)
feature = np.array(train_data.iloc[:, :-2])
label = np.array(train_data.iloc[:, -2:])
# 对特征进行Z得分归一化
feature_scaler = StandardScaler()
features = feature_scaler.fit_transform(feature)
features_means = feature_scaler.mean_
features_stds = feature_scaler.scale_
# 对标签进行Z得分归一化
label_scaler = StandardScaler()
labels = label_scaler.fit_transform(label)
label_means = torch.tensor(label_scaler.mean_).float().to(device)
label_stds = torch.tensor(label_scaler.scale_).float().to(device)
#转换成张量
inputs_tensor = torch.from_numpy(features).float().to(device) # 转换为浮点张量
labels_tensor = torch.from_numpy(labels).float().to(device) # 如果标签是数值型数
label_tensor = torch.from_numpy(label).float().to(device) # 如果标签是数值型数
loss_history = []
# parser.add_argument('--patience', default=20, type=int, help='patience')
early_stopping = EarlyStopping(patience=20, verbose=True)
# 训练模型
for epoch in range(2000):
optimizer.zero_grad() # 清空之前的梯度
output = model(inputs_tensor) # 前向传播
# 反归一化
outputs = (output * label_stds) + label_means
loss = criterion(outputs, label_tensor) # 计算损失
loss_history.append(loss.item()) # 将损失值保存在列表中
loss.backward() # 反向传播
optimizer.step() # 更新权重
if epoch % 100 == 0:
print('epoch is', epoch, 'loss is', loss.item(), )
early_stopping(loss, model)
if early_stopping.early_stop:
print("early stopping")
break
import matplotlib.pyplot as plt
loss_history = np.array(loss_history)
plt.plot(loss_history)
plt.xlabel = ('epoch')
plt.ylabel = ('loss')
plt.show()
torch.save(model, 'model\entire_model.pth')
运行结果是这样的:
损失值还是特别大。。。。。
调整不同参数对比结果
应用这些方法技巧之后,损失值其实变化并没有很大,那到底应该怎么处理才可以呢?
难道是输入输出之间的关系太复杂了??
我重新调整输入输出关系,去掉噪点:
sample = 3000
X1 = np.zeros((sample, 1))
X1[:, 0] = np.random.normal(1, 1, sample)
X2 = np.zeros((sample, 1))
X2[:, 0] = np.random.normal(2, 1, sample)
X3 = np.zeros((sample, 1))
X3[:, 0] = np.random.normal(3, 1, sample)
# 模型
Y = 60 * X1 - 300 * X2 + X3 * 560
直接用这个线性的公式
初始代码结果
运行fnn_easy.py:
输出结果:::
epoch is 0 loss is 1691248.25
epoch is 1000 loss is 1261221.5
epoch is 2000 loss is 951328.4375
epoch is 3000 loss is 727150.8125
epoch is 4000 loss is 573457.9375
epoch is 5000 loss is 478112.90625
epoch is 6000 loss is 428555.1875
epoch is 7000 loss is 409844.65625
epoch is 8000 loss is 405917.53125
epoch is 9000 loss is 405627.59375
Test loss: 869377.875
调整神经网络结构
神经网络结构[3, 20, 20, 20, 20, 20, 2],增加层数或者神经元数量
变成这个样子:
Arc = [3, 10, 20, 20, 20, 20, 20, 20, 20, 20, 1]
输出结果::
迭代7000次感觉就平缓了??
epoch is 0 loss is 1691888.625
epoch is 1000 loss is 1262176.625
epoch is 2000 loss is 952115.4375
epoch is 3000 loss is 727737.125
epoch is 4000 loss is 573853.3125
epoch is 5000 loss is 478343.09375
epoch is 6000 loss is 428659.3125
epoch is 7000 loss is 409874.03125
epoch is 8000 loss is 405920.875
epoch is 9000 loss is 405627.6875
Test loss: 869377.0625
测试集损失好大,感觉训练集过拟合了??但是这过拟合损失值怎么还这么大啊
改了网络结构,几乎没有变化。。。。。
/(ㄒoㄒ)/~~
调整激活函数
尝试sigmond、tanh、relu激活函数
- relu结果:
训练集损失值下降的这么快!!!但是到后面怎么基本不变了呀?
epoch is 0 loss is 1691534.125
epoch is 1000 loss is 7115.28857421875
epoch is 2000 loss is 4782.103515625
epoch is 3000 loss is 2784.762451171875
epoch is 4000 loss is 2546.99072265625
epoch is 5000 loss is 1512.3265380859375
epoch is 6000 loss is 1262.20556640625
epoch is 7000 loss is 1113.3638916015625
epoch is 8000 loss is 979.3409423828125
epoch is 9000 loss is 995.0311279296875
Test loss: 637214.0
训练集的损失值降低了,但是测试集损失值怎么还是这么高????
- 换成tanh试试::
训练集损失值下降的比较平缓
epoch is 0 loss is 1691698.125
epoch is 1000 loss is 1255924.75
epoch is 2000 loss is 930623.8125
epoch is 3000 loss is 681349.75
epoch is 4000 loss is 492137.4375
epoch is 5000 loss is 350489.28125
epoch is 6000 loss is 246078.875
epoch is 7000 loss is 170598.15625
epoch is 8000 loss is 117428.3203125
epoch is 9000 loss is 79978.34375
Test loss: 704667.75
测试集损失值和训练的差不多,但是还是太大了吧。。/(ㄒoㄒ)/~~
调整迭代次数
从10000改到100000
……
epoch is 96000 loss is 817.8455200195312
epoch is 97000 loss is 888.3745727539062
epoch is 98000 loss is 938.8643798828125
epoch is 99000 loss is 741.688232421875
Test loss: 644070.5
增加早停法
似乎没有必要,因为损失值一直很大
…
Validation loss decreased (405623.906250 --> 405623.906250). Saving model ...
…
EarlyStopping counter: 39 out of 40
EarlyStopping counter: 40 out of 40
early stopping
Test loss: 869326.5625
早停几乎没有起作用。基本也是迭代到10000才停下来
变量归一化处理
这里输入数据在同一数量级,其实不需要进行归一化处理。
epoch is 0 loss is 435491.9375
epoch is 1000 loss is 10410.7548828125
epoch is 2000 loss is 5559.599609375
epoch is 3000 loss is 4180.0361328125
epoch is 4000 loss is 2995.7177734375
epoch is 5000 loss is 1987.0074462890625
epoch is 6000 loss is 2300.583251953125
epoch is 7000 loss is 1547.3831787109375
epoch is 8000 loss is 1220.7880859375
epoch is 9000 loss is 1113.998779296875
Test loss: 847746.5625
看起来训练集的损失值很小了,但是测试集损失值这么大,应该是过拟合了。为啥损失值这么大还会过拟合/(ㄒoㄒ)/~~
正则化系数调整
考虑增加正则化系数,使用Dropout,
正则化应该很有必要,试试看哈、系数0.1→0.3
epoch is 0 loss is 1691138.75
epoch is 1000 loss is 1263657.625
epoch is 2000 loss is 953652.375
epoch is 3000 loss is 728953.6875
epoch is 4000 loss is 574693.625
epoch is 5000 loss is 478837.53125
epoch is 6000 loss is 428884.21875
epoch is 7000 loss is 409937.78125
epoch is 8000 loss is 405928.15625
epoch is 9000 loss is 405627.78125
Test loss: 603928.75
系数→0.5试试看
epoch is 0 loss is 1690641.875
epoch is 1000 loss is 1259986.5
epoch is 2000 loss is 950166.9375
epoch is 3000 loss is 726251.125
epoch is 4000 loss is 572842.4375
epoch is 5000 loss is 477752.59375
epoch is 6000 loss is 428392.0
epoch is 7000 loss is 409798.75
epoch is 8000 loss is 405912.375
epoch is 9000 loss is 405627.96875
Test loss: 616781.9375
和0.3似乎没什么变化,那么把epoch变大一点看看:
epoch is 0 loss is 1691865.875
epoch is 1000 loss is 1259460.75
epoch is 2000 loss is 949569.0625
epoch is 3000 loss is 725770.8125
epoch is 4000 loss is 572510.1875
epoch is 5000 loss is 477556.375
epoch is 6000 loss is 428302.96875
epoch is 7000 loss is 409773.96875
epoch is 8000 loss is 405909.4375
epoch is 9000 loss is 405625.90625
epoch is 10000 loss is 405623.9375
epoch is 11000 loss is 132178.5
epoch is 12000 loss is 88326.0703125
epoch is 13000 loss is 72454.6640625
epoch is 14000 loss is 58360.8125
epoch is 15000 loss is 50380.35546875
epoch is 16000 loss is 44466.1875
epoch is 17000 loss is 43652.4765625
Test loss: 758716.4375
不如不增加epoch,损失值更大了。
学习率调整
尝试调整学习率,直接加入自适应调整学习率,也还是不行
Epoch 8370: reducing learning rate of group 0 to 5.1200e-09.
epoch is 9000 loss is 405878.09375
Epoch 9001, Current Learning Rate: 5.120000000000003e-09
Test loss: 851336.625
还是不行。。。。。。。。。。所以到底应该怎么调整啊?????
总结ing
调整方法 | 训练损失值 |
---|---|
调整神经网络结构 | 405627 |
调整激活函数relu | 995 |
调整激活函数tanh | 79978 |
调整迭代次数 | 741 |
增加早停法 | 405623 |
变量归一化处理 | 1113 |
正则化系数调整 | 405627 |
学习率调整 | 405878 |
从上表可以看出,合适的激活函数、增加迭代次数、对变量进行归一化处理会降低训练集的损失值。
增加迭代次数,训练集损失值降低效果最明显,但是也同时增加了过拟合风险
可以看到relu激活函数的效果比较好,训练集损失值较小
对变量进行归一化处理,
调整方法 | 测试损失值 |
---|---|
调整神经网络结构 | 869377 |
调整激活函数relu | 637214 |
调整激活函数tanh | 704667 |
调整迭代次数 | 644070 |
增加早停法 | 869326 |
变量归一化处理 | 847746 |
正则化系数调整 | 603928 |
学习率调整 | 851336 |
提高正则化的系数,可以尽量避免过拟合,所以在测试集里的损失值是最低的,泛化效果较好一些。如果测试集损失值比训练集损失值差太多,一定要加入正则化。
调整合适的激活函数也能降低测试集的损失值。
增加迭代次数也能降低测试集的损失值。但是增加迭代次数会提高计算成本,我这里是10000次增加到100000次,运算时间增加了很长时间。
fnn.py进行计算:
输出结果是
epoch is 99970 loss is 1.0438059568405151
Epoch 99971, Current Learning Rate: 0.01
epoch is 99980 loss is 2.225820541381836
Epoch 99981, Current Learning Rate: 0.01
epoch is 99990 loss is 0.6199669241905212
Epoch 99991, Current Learning Rate: 0.01
Test loss: 828622.4375
训练集几乎没有损失,但是测试集是什么鬼????损失值这么大!!!
破案了。。
首先、生成的数据集不合适!!!
神经网络训练中通常不推荐使用正态分布(高斯分布)的数据,因为可能发生:
-
梯度消失/爆炸:正态分布的数据具有无限方差的特性,即数据值可以非常大或非常小。在神经网络中,如果输入数据的标准差非常大,那么在反向传播过程中,梯度可能会变得非常小(消失)或者非常大(爆炸)。这会导致网络权重的更新非常缓慢或不稳定,从而影响训练效果和收敛速度。
-
激活函数敏感性:神经网络中的激活函数(如ReLU)对输入数据的分布敏感。如果数据遵循正态分布,那么大部分数据值将集中在均值附近,导致激活函数的非线性特性得不到充分利用。这可能会导致网络表达能力受限,从而影响模型的性能。
因此,在神经网络训练中,通常会对数据进行预处理,如标准化(使数据均值为0,方差为1)或归一化(将数据缩放到一个特定的范围,如[0, 1]或[-1, 1])。这些预处理步骤有助于减少梯度消失/爆炸问题,提高激活函数的非线性利用率,以及增强模型的泛化能力。
绘制 validate_outputs(预测值)和 validate_labels_tensor(真实值)
其次,测试集损失值这么大,是有异常值!!!!
不是模型没有很好地学习到数据的特征,或者在验证集上过拟合了。。而是有异常值。
MSE 损失函数的一个优点是它能够惩罚较大的预测误差,因为误差的平方会随着误差的增加而显著增加。然而,它也有缺点,比如对于异常值比较敏感,因为较大的误差会对总体 MSE 有更大的影响。
我用的就是均方误差(MSE)损失函数,,,难怪算的损失值一直这么大。。
import matplotlib.pyplot as plt
# 假设 validate_outputs 和 validate_labels_tensor 都是一维张量
# 并且它们的形状相同,即每个预测值对应一个真实值
# 绘制散点图,其中 x 轴是真实值,y 轴是预测值
plt.scatter(validate_labels_tensor.cpu().numpy(), validate_outputs.cpu().numpy())
# 添加标题和轴标签
plt.title('Predicted vs Actual')
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
# 绘制 y=x 线,表示完美的预测
plt.plot([validate_labels_tensor.min().cpu().numpy(), validate_labels_tensor.max().cpu().numpy()], [validate_labels_tensor.min().cpu().numpy(), validate_labels_tensor.max().cpu().numpy()], color='red')
# 显示图表
plt.show()