一、线性回归(连续值的预测)
线性回归基本要素:
- 模型
- 数据集
- 损失函数
- 优化函数
模型使用pytorch的简洁实现的步骤
- 读取数据集
- 定义模型
- 初始化模型参数
- 定义损失函数
- 定义优化函数
- 训练
错题:
课程中的损失函数定义为:
def squared_loss(y_hat, y):
return (y_hat - y.view(y_hat.size())) ** 2 / 2
将返回结果替换为下面的哪一个会导致会导致模型无法训练:
- (y_hat.view(-1) - y) ** 2 / 2
解析:y_hat.view(-1)的形状是[n],与y一致,可以相减; - (y_hat - y.view(-1)) ** 2 / 2
解析:y_hat的形状是[n, 1],而y的形状是[n],y.view(-1)的形状仍是[n],两者相减得到的结果的形状是[n, n],相当于用y_hat的每一个元素分别减去y的所有元素,所以无法得到正确的损失值。 - (y_hat - y.view(y_hat.shape)) ** 2 / 2
- (y_hat - y.view(-1, 1)) ** 2 / 2
解析:y.view(y_hat.shape)和y.view(-1, 1)的形状都是[n, 1],与y_hat一致,可以相减;
二、softmax与分类模型(离散值的预测)
1. 为什么使用交叉熵损失函数?
想要预测分类结果正确,我们其实并不需要预测概率完全等于标签概率,只需要找到预测概率最大的那个类别
交叉熵损失函数
loss = nn.CrossEntropyLoss() # 下面是他的函数原型
# class torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')
- 读取数据:torch.utils.data.DataLoader()
- x.view()等价于x.shape()
- 优化函数
optimizer = torch.optim.SGD(net.parameters(), lr=0.1) # 下面是函数原型
# class torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)
三、过拟合、欠拟合及其解决方案
- 训练误差和泛化误差 如果用测试误差来近似泛化误差,则不能用测试集调整模型参数
- 欠拟合与过拟合: 欠拟合现象:模型无法达到一个较低的误差 过拟合现象:训练误差较低但是泛化误差依然较高,二者相差较大
- 缓解过拟合:
- 增加数据集
- L2正则化
- dropout
- 缓解欠拟合:
- 增加模型复杂度
四、梯度消失、梯度爆炸以及Kaggle房价预测
- Xavier随机初始化
- 环境因素
- 协变量偏移:输入p(x)发生改变
- 标签偏移:标签p(y)上的边缘分布发生变化
- 概念偏移:相同的概念由于地理位置不同有不同的含义
基础篇实战:Kaggle房价预测
- import 和 from+import
import:import时不会和本地的文件中的变量函数名冲突,模块名.函数名()就能使用,但是这种方法会一次性导入该模块中所有的,比较占内存
from+import:直接使用变量函数名就可以,但是如果本地文件中有相同的名字会发生冲突,一般注意一点就没问题,想用什么导入什么, 相比上面的会节约许多内存 - 预处理数据:将连续值的特征做标准化,缺失值替换成均值
numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index
all_features[numeric_features] = all_features[numeric_features].apply(lambda x: (x - x.mean()) / (x.std()))
all_features[numeric_features] = all_features[numeric_features].fillna(0)
- 预处理:将离散化的值转成指示特征
all_features = pd.get_dummies(all_features, dummy_na=True)
完整代码(未调参):
import torch
import torch.nn as nn
import pandas as pd
import torch.utils.data as Data
def get_net(feature_num):
net = nn.Linear(feature_num, 1)
for param in net.parameters():
nn.init.normal_(param, mean=0, std=0.01)
return net
loss = torch.nn.MSELoss()
# 对数均方根误差的实现:
def log_rmse(net, features, labels):
with torch.no_grad():
features = torch.tensor(features, dtype=torch.float)
clipped_preds = torch.max(net(features), torch.tensor(1.0))
rmse = torch.sqrt(2*loss(clipped_preds.log(), labels.log()).mean())
return rmse.item()
# 训练
def train(net, train_features, train_labels, test_features, test_labels, batch_size, learning_rate, weight_decay, num_epochs):
train_ls, test_ls = [], []
dataset = Data.TensorDataset(train_features, train_labels)
train_iter = Data.DataLoader(dataset, batch_size, shuffle=True)
optimizer = torch.optim.Adam(params=net.parameters(),lr=learning_rate, weight_decay=weight_decay)
net = net.float()
for epoch in range(num_epochs):
for X, y in train_iter:
l = loss(net(X.float()), y.float())
optimizer.zero_grad()
l.backward()
optimizer.step()
train_ls.append(log_rmse(net, train_features, train_labels))
if test_labels is not None:
test_ls.append(log_rmse(net, test_features, test_labels))
return train_ls, test_ls
# k折交叉验证
def get_k_fold_data(k, i, X, y):
assert k > 1
fold_size = X.shape[0]//k
X_train, y_train = None, None
for j in range(k):
idx = slice(j*fold_size, (j+1)*fold_size)
X_part, y_part = X[idx, :], y[idx]
if j == i:
X_valid, y_valid = X_part, y_part
elif X_train is None:
X_train, y_train = X_part, y_part
else:
X_train = torch.cat((X_train, X_part))
y_train = torch.cat((y_train, y_part))
return X_train, y_train, X_valid, y_valid
# 在 K 折交叉验证中训练 K 次并返回训练和验证的平均误差
def k_fold(k, X_train, y_train, num_epochs, learning_rate, weight_decay, batch_size):
train_l_sum, valid_l_sum = 0, 0
for i in range(k):
data = get_k_fold_data(k, i, X_train, y_train)
net = get_net(X_train.shape[1])
# *dat a列表或tuple
train_ls, vaild_ls = train(net, *data, batch_size, learning_rate, weight_decay, num_epochs)
train_l_sum += train_ls[-1]
valid_l_sum += vaild_ls[-1]
print("fold %d, train rmse %f, valid rmse%f" % (i, train_ls[-1], vaild_ls[-1]))
return train_l_sum / k, valid_l_sum /k
def main():
# 读数据
test_data = pd.read_csv("D:\\python-project\\Learn_Pytorch\\house_price\\input\\houseprices2807\\data\\test.csv")
train_data = pd.read_csv("D:\\python-project\\Learn_Pytorch\\house_price\\input\\houseprices2807\\data\\train.csv")
all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:]))
# 预处理数据:将连续值的特征做标准化,缺失值替换成均值
numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index
all_features[numeric_features] = all_features[numeric_features].apply(lambda x: (x - x.mean()) / (x.std()))
all_features[numeric_features] = all_features[numeric_features].fillna(0)
# 预处理:将离散化的值转成指示特征
all_features = pd.get_dummies(all_features, dummy_na=True)
train_features = torch.tensor(all_features[:train_data.shape[0]].values, dtype=float)
test_features = torch.tensor(all_features[train_data.shape[0]:].values, dtype=float)
train_labels = torch.tensor(train_data.SalePrice.values, dtype=torch.float).view(-1, 1)
k, num_epochs, lr, weight_decay, batch_size = 5, 100, 5, 0, 64
train_l, vaild_l = k_fold(k, train_features, train_labels, num_epochs, lr, weight_decay, batch_size)
print('%d-fold validation: avg train rmse %f, avg valid rmse %f' % (k, train_l, vaild_l))
if __name__ == "__main__":
main()