week2_lab2 Gradient Descent for Linear Regression(Boston Housing dataset)(学习笔记)
In this exercise, you will learn the following
- implement the gradient descent method
- implement the minibatch gradient descent method
We will use the Boston Housing data, similar to Week 1. We can import the dataset and preprocess it as follows. Note we add a feature of to x_input to get a n x (d+1) matrix x_in
import pandas as pd
import numpy as np
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
boston_data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]
data = boston_data;
x_input = data # a data matrix
y_target = target; # a vector for all outputs
# add a feature 1 to the dataset, then we do not need to consider the bias and weight separately
x_in = np.concatenate([np.ones([np.shape(x_input)[0], 1]), x_input], axis=1)
# we normalize the data so that each has regularity
x_in = preprocessing.normalize(x_in)
x_in = np.concatenate([np.ones([np.shape(x_input)[0], 1]), x_input], axis=1): 这一行用于添加一个额外的特征(常数1)到输入特征矩阵 x_input 前面,这是为了不需要单独考虑偏置项(bias)和权重项(weights)。具体解释如下:
- np.ones([np.shape(x_input)[0], 1]) 创建一个形状为 (n, 1) 的矩阵,其中 n 是示例的数量,每个示例的该列都是1。
- x_input 是原始输入特征矩阵,形状为 (n, d),其中 n 是示例数量,d 是特征数量。
- np.concatenate(…, axis=1) 用于按列(axis=1)将这两个矩阵连接起来,形成一个新的特征矩阵 x_in,其中第一列是1,表示常数项,后面是原始特征。
Linear Model & Cost Function
def linearmat_2(w, X):
'''
a vectorization of linearmat_1 in Week 1 lab.
Input: w is a weight parameter (including the bias), and X is a data matrix (n x (d+1)) (including the feature)
Output: a vector containing the predictions of linear models
'''
return np.dot(X, w)
def cost(w, X, y):
'''
Evaluate the cost function in a vectorized manner for
inputs `X` and outputs `y`, at weights `w`.
'''
residual = y - linearmat_2(w, X) # get the residual
err = np.dot(residual, residual) / (2 * len(y)) # compute the error
return err
Gradient Computation
计算cost function 的梯度
# Vectorized gradient function
def gradfn(weights, X, y):
'''
Given `weights` - a current "Guess" of what our weights should be
`X` - matrix of shape (N,d+1) of input features including the feature $1$
`y` - target y values
Return gradient of each weight evaluated at the current value
'''
y_pred = np.dot(X, weights)
error = y_pred - y
return np.dot(X.T, error) / len(y)
Gradient Descent
使用计算完的梯度进行梯度下降
def solve_via_gradient_descent(X, y, print_every=100,
niter=5000, eta=1):
'''
Given `X` - matrix of shape (N,D) of input features
`y` - target y values
`print_every` - we report performance every 'print_every' iterations
`niter` - the number of iterates allowed
`eta` - learning rate
Solves for linear regression weights with gradient descent.
Return
`w` - weights after `niter` iterations
`idx_res` - the indices of iterations where we compute the cost
`err_res` - the cost at iterations indicated by idx_res
'''
N, D = np.shape(X)
# initialize all the weights to zeros
w = np.zeros([D])
idx_res = []
err_res = []
for k in range(niter):
# compute the gradient
dw = gradfn(w, X, y)
# gradient descent
w = w - eta * dw
# we report the progress every print_every iterations
if k % print_every == print_every - 1:
t_cost = cost(w, X, y)
print('error after %d iteration: %s' % (k, t_cost))
idx_res.append(k)
err_res.append(t_cost)
return w, idx_res, err_res
Minibatch Gradient Descent
def solve_via_minibatch(X, y, print_every=100,
niter=5000, eta=1, batch_size=50):
'''
Solves for linear regression weights with nesterov momentum.
Given `X` - matrix of shape (N,D) of input features
`y` - target y values
`print_every` - we report performance every 'print_every' iterations
`niter` - the number of iterates allowed
`eta` - learning rate
`batch_size` - the size of minibatch
Return
`w` - weights after `niter` iterations
`idx_res` - the indices of iterations where we compute the cost
`err_res` - the cost at iterations
'''
N, D = np.shape(X)
# initialize all the weights to zeros
w = np.zeros([D])
idx_res = []
err_res = []
tset = list(range(N))
for k in range(niter):
# TODO: Insert your code to update w by minibatch gradient descent
idx = random.sample(tset, batch_size)
#sample batch of data
sample_X = X[idx, :]
sample_y = y[idx]
dw = gradfn(w, sample_X, sample_y)
w = w - eta * dw
if k % print_every == print_every - 1:
t_cost = cost(w, X, y)
print('error after %d iteration: %s' % (k, t_cost))
idx_res.append(k)
err_res.append(t_cost)
return w, idx_res, err_res