动手学深度学习第一课：线性回归

最新推荐文章于 2024-10-18 00:00:00 发布

阿光light

最新推荐文章于 2024-10-18 00:00:00 发布

阅读量315

点赞数

分类专栏：动手学深度学习19天文章标签： python 机器学习神经网络

本文链接：https://blog.csdn.net/weixin_42098609/article/details/104260857

版权

动手学深度学习19天专栏收录该内容

13 篇文章 0 订阅

订阅专栏

DAY1 线性回归

线性回归模型
线性回归的代码实现

线性回归模型

模型简述

以预测房价为例，影响房价的只有两个因素：面积（平方米）以及房龄（年）。这里影响房价的因素称为特征（feature），真实的房价称为标签（label）。假设房价与特征之间呈线性相关，我们就可以构建一个线性回归模型如下：
$\text{price}=\omega_{area}\cdot x_{area}+\omega_{age}\cdot x_{age}+b$
$\omega_{area}$ 和 $\omega_{age}$ 绝对值的大小表明了面积以及房龄对房价的影响程度， $b$ 修正偏差。 $\omega_{area},\omega_{age}$ 以及 $b$ 都是待训练优化的参数。

数据集

用于训练模型的所有样本全体称为训练集（traning set）,在本例中，一栋房子就是一个样本（sample），包含两个特征：面积以及房龄，已知真实房价。

损失函数

损失函数也是模型训练的目标函数，通常衡量的是真实值与模型预测值之间的偏差。偏差越小，说明模型预测的结果越好。本例损失函数可写为均方误差形式：
$L(\textbf{w},b)=\frac{1}{n}\sum_{i=1}^n\frac{1}{2}(\textbf{w}^T\text{x}^{(i)}+b-y^{(i)})$
我们的目标就是最小化损失函数。

优化函数-随机梯度下降法

这里的优化函数其实就是指更新（本例就是 $\text{w}$ 和 $b$ ）参数的方法。我们初始化的模型参数必然会使得预测值与真实值相差很大（即上述的损失函数值很大），这时，就需要更新参数来使得损失函数值变小。本例采用的随机梯度下降法，它的思想就是函数沿着负梯度方向下降最快。具体实现：先随机初始化模型参数，接下来对参数进行多次迭代。每次迭代过程中，随机均匀采样一个固定数目的训练样本组成小批量（mini-batch） $\mathcal{B}$ ，然后求小批量样本数据的平均损失对模型参数的导数（梯度），沿负梯度方向下降，即模型参数直接减去梯度值即可，但通常会在梯度前乘以一个[0,1]之间的数，为下降步长，又称学习率（learning rate）。
$(\textbf{w},b)\leftarrow(\textbf{w},b)-\frac{\eta}{|\mathcal{B}|}\sum_{i\in\mathcal{B}}\partial_{(\textbf{w},b)}l^{(i)}(\textbf{w},b)$

线性回归的代码实现

代码实现会采用两种方法，一种从零实现，一种利用torch线性模块实现。我们首先比较矢量计算用for循坏实现以及直接矢量相加实现，其运行速度的快慢有何不同。（虽然我们早就心知肚明）

import torch
import time

n = 1000
a = torch.ones(n)
b = torch.ones(n)

# define a timer class to record time
class Timer(object):
    def __init__(self):
        self.times = []
        self.start()

    def start(self):
        # start the timer
        self.start_time = time.time()
    def stop(self):
        # stop the timer and record time into a list
        self.times.append(time.time() - self.start_time)
        return self.times[-1]
    def avg(self):
        # calculate the average and return
        return sum(self.times)/len(self.times)
    def sum(self):
        # return the sum of recorded time
        return sum(self.times)

# for循环
timer = Timer()
c = torch.zeros(n)
for i in range(n):
    c[i] = a[i] + b[i]
print('%.5f sec' % timer.stop())

# 矢量相加
timer.start()
c = a + b
print('%.5f sec' % timer.stop())

输出：
0.01325 sec
0.00000 sec

time.time()函数返回的是从1970-01-01 00:00:00起到当前时间，按秒计算，输出为过去多少秒。后者比前者快。

线性回归从零开始实现

import torch
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import random

# 生成数据集
# set input feature number
num_inputs = 2
# set examples number
num_example = 1000
#set ture weight and bias in order to generate corresponded label
true_w = [2,-3.4]
true_b = 4.2

features = torch.randn(num_example,num_inputs,dtype=torch.float32)
labels = true_w[0] * features[:,0] + true_w[1] * features[:,1] + true_b
# 由于真实标签与模型预测值是存在偏差的，因此这了添加了正态分布项来模拟真实值
labels += torch.tensor(np.random.normal(0,0.01,size=labels.size()),
                       dtype=torch.float32)

plt.scatter(features[:,1].numpy(),labels.numpy(),1)
plt.show()

# 读取数据集
def data_iter(batch_size,features,labels):
    num_example = len(features)
    indices = list(range(num_example))
    random.shuffle(indices) # 将样本排序打乱
    for i in range(0,num_example,batch_size):
        j = torch.LongTensor(indices[i:min(i+batch_size,num_example)])
        yield features.index_select(0,j), labels.index_select(0,j)

batch_size = 10

for X,y in data_iter(batch_size,features,labels):
    print(X,'\n',y)
    break

# 初始化模型参数
w = torch.tensor(np.random.normal(0,0.01,(num_inputs,1)),dtype=torch.float32)
b = torch.zeros(1,dtype=torch.float32)
# w和b是需要反向传播的
w.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)

# 定义模型
def linreg(X,w,b):
    return torch.mm(X,w) + b

# 定义损失函数
def squared_loss(y_hat,y):
    # y.view是将y重新view成y_hat的size
    return (y_hat - y.view(y_hat.size())) ** 2 / 2

# 定义优化函数
def sgd(params,lr,batch_size):
    for param in params:
        # ues .data to operate param without gradient track
        param.data -= lr * param.grad / batch_size

# 训练
lr = 0.03
num_epochs = 5
net = linreg
loss = squared_loss

# traning
for epoch in range(num_epochs):
    '''
    training repeats num_epochs times
    in each epoch, all the samples in dataset will be used once
    X is the feature and y is the label of a batch sample
    '''
    for X,y in data_iter(batch_size,features,labels):
        l = loss(net(X,w,b),y).sum()
        # calculate the gradient of batch sample loss
        l.backward()
        # using small batch random gradient descent to iter model parameter
        sgd([w,b],lr,batch_size)
        #print(w,b)
        # reset parameter gradient, avoid stacking
        w.grad.data.zero_()
        b.grad.data.zero_()
        #print(w,b)
    train_l = loss(net(features,w,b),labels)
    print('epoch %d, loss %f' % (epoch+1,train_l.mean().item()))

print(w,true_w,b,true_b)

numpy.random.normal(loc,scale,size)正态分布:
loc：float or array_like of floats，意义为概率分布的均值，对应分布中心；
scale：float or array_like of floats，意义为概率分布的标准差，对应于分布的宽度，scale越大越矮胖，scale越小，越瘦高。
size：int or tuple of ints, optional，表示输出的shape，默认为None，只输出一个值
标准的正态分布：numpy.random.normal(loc=0.0, scale=1.0, size=None)
torch.tensor是32-bit floating point，torch.LongTensor是64-bit integer (signed)。
python中的yield函数，斐波那契数列

def fab(max):
    n, a, b = 0, 0, 1
    while n < max:
        yield b  
        a, b = b, a + b
        n = n + 1
for n in fab(5):
    print(n)

简单地讲，yield 的作用就是把一个函数变成一个 generator，带有 yield 的函数不再是一个普通函数，Python 解释器会将其视为一个 generator，调用 fab(5) 不会执行 fab 函数，而是返回一个 iterable 对象！在 for 循环执行时，每次循环都会执行 fab 函数内部的代码，执行到 yield b 时，fab 函数就返回一个迭代值，下次迭代时，代码从 yield b 的下一条语句继续执行，而函数的本地变量看起来和上次中断执行前是完全一样的，于是函数继续执行，直到再次遇到 yield。（摘自菜鸟教程，https://blog.csdn.net/mieleizhi0522/article/details/82142856这篇博客也讲得很好）

index_select(self,input,dim,index)：dim表示选择的维度（比如按列选择还是按行选择），index表示根据选择的维度所挑选的数据下标

import torch

a = torch.linspace(1,16,steps=16).view(4,4)
print(a,a.shape)
# 挑选的数据下标
ind = torch.tensor([0,2])
# 0表示按行索引，1表示按列索引
b = torch.index_select(a,0,ind)
print(b)
# 第二种写法
c = a.index_select(0,ind)
print(c)
输出：
tensor([[ 1.,  2.,  3.,  4.],
        [ 5.,  6.,  7.,  8.],
        [ 9., 10., 11., 12.],
        [13., 14., 15., 16.]]) torch.Size([4, 4])
tensor([[ 5.,  6.,  7.,  8.],
        [13., 14., 15., 16.]])
tensor([[ 5.,  6.,  7.,  8.],
        [13., 14., 15., 16.]])

view函数，神经网络代码中还常见x.view(x.size()[0],-1)，x.size()[0]通常表示batch_size，-1表示自适应(有这种情况，但不唯一)。
在pytorch里面，x.item()返回的是张量x里面的元素，而不是张量x。

import torch

x = torch.tensor([1])
y = x.item()
print(x,x.type())
print(y)

output:
tensor([1]) torch.LongTensor
1

线性回归模型的PyTorch实现

import torch
from torch import nn
import numpy as np
import torch.utils.data as Data
from torch.nn import init
import torch.optim as optim
torch.manual_seed(1)

torch.set_default_tensor_type('torch.FloatTensor')

# 生成数据集
num_inputs = 2
num_examples = 1000

true_w = [2,-3.4]
true_b = 4.2
features = torch.tensor(np.random.normal(0,1,(num_examples,num_inputs)),dtype=torch.float32)
labels = true_w[0] * features[:,0] + true_w[1] * features[:,1] + true_b
labels += torch.tensor(np.random.normal(0,0.01,size=labels.size()),dtype=torch.float32)

# 读取数据集
batch_size = 10
# combine features and labels of dataset
dataset = Data.TensorDataset(features,labels)
# put dataset into DataLoader
data_iter = Data.DataLoader(
    dataset=dataset,         # torch TensorDataset format
    batch_size=batch_size,   # mini batch size
    shuffle=True,            # whether shuffle the data or not
    #num_workers=2,           # read data in multithreading
)

for X,y in data_iter:
    print(X,'\n',y)
    break

# 定义模型
class LinearNet(nn.Module):
    def __init__(self,n_feature):
        super(LinearNet,self).__init__()  # call father function to init
        self.linear = nn.Linear(n_feature,1)  # function prototype:'torch.nn.Linear(in_features,out_features,bias=True)'

    def forward(self,x):
        y = self.linear(x)
        return y
net = LinearNet(num_inputs)
print(net)

net = nn.Sequential(
    nn.Linear(num_inputs,1)
)
# 初始化参数模型
init.normal_(net[0].weight,mean=0.0,std=0.01)
init.constant_(net[0].bias,val=0.0)  # or you can use 'net[0].bias.data.fill_(0)' to modify it directly

for param in net.parameters():
    print(param)

# 定义损失函数
loss = nn.MSELoss() # nn built-in squared loss function
                    # function prototype: `torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`

# 定义优化函数
optimizer = optim.SGD(net.parameters(),lr=0.03)
print(optimizer)

# 训练
num_epochs = 3
for epoch in range(1,num_epochs+1):
    for X,y in data_iter:
        output = net(X)
        l = loss(output,y.view(-1,1))
        optimizer.zero_grad()
        l.backward()
        optimizer.step()
    print('epoch %d, loss: %f' % (epoch,l.item()))

dense = net[0]
print(true_w, dense.weight.data)
print(true_b,dense.bias.data)

构建多层网络的三种方法

from torch import nn

# method 1
net = nn.Sequential(
    nn.Linear(num_inputs,1)
    # other layers can be added here
)

# method 2
net = nn.Sequential()
net.add_module('Linear',nn.Linear(num_inputs,1))
# net.add_module ......

# method 3
from collections import OrderedDict
net = nn.Sequential(OrderedDict([
          ('linear', nn.Linear(num_inputs, 1))
          # ......
        ]))