本文推荐一个更好的版本理解 梯度下降,下面的代码选自本文
这大概是我见过最好理解的一个版本
- 梯度下降的场景假设
- 梯度
- 梯度下降算法的数学解释
- 梯度下降算法的实例
- 梯度下降算法的实现
- Further reading
代价函数:
J
(
θ
)
=
1
2
m
(
X
θ
−
y
→
)
T
(
X
θ
−
y
→
)
J(\theta) = \frac{1}{2m} (X\theta-\overrightarrow{y})^T(X\theta-\overrightarrow{y})
J(θ)=2m1(Xθ−y)T(Xθ−y)
求导可得:
∇
J
(
θ
)
=
1
m
X
T
(
X
θ
−
y
→
)
\nabla J(\theta) = \frac{1}{m} X^T(X\theta-\overrightarrow{y})
∇J(θ)=m1XT(Xθ−y)
首先,我们需要定义数据集和学习率
import numpy as np
# Size of the points dataset.
m = 20
# Points x-coordinate and dummy value (x0, x1).
X0 = np.ones((m, 1))
X1 = np.arange(1, m+1).reshape(m, 1)
X = np.hstack((X0, X1))
# Points y-coordinate
y = np.array([
3, 4, 5, 5, 2, 4, 7, 8, 11, 8, 12,
11, 13, 13, 16, 17, 18, 17, 19, 21
]).reshape(m, 1)
# The Learning Rate alpha.
alpha = 0.01
接下来我们以矩阵向量的形式定义代价函数和代价函数的梯度
def error_function(theta, X, y):
'''Error function J definition.'''
diff = np.dot(X, theta) - y
return (1./2*m) * np.dot(np.transpose(diff), diff)
def gradient_function(theta, X, y):
'''Gradient of the function J definition.'''
diff = np.dot(X, theta) - y
return (1./m) * np.dot(np.transpose(X), diff)
最后就是算法的核心部分,梯度下降迭代计算
def gradient_descent(X, y, alpha):
'''Perform gradient descent.'''
theta = np.array([1, 1]).reshape(2, 1)
gradient = gradient_function(theta, X, y)
while not np.all(np.absolute(gradient) <= 1e-5):
theta = theta - alpha * gradient
gradient = gradient_function(theta, X, y)
return theta
当梯度小于1e-5时,说明已经进入了比较平滑的状态,类似于山谷的状态,这时候再继续迭代效果也不大了,所以这个时候可以退出循环!
完整的代码如下
import numpy as np
# Size of the points dataset.
m = 20
# Points x-coordinate and dummy value (x0, x1).
X0 = np.ones((m, 1))
X1 = np.arange(1, m+1).reshape(m, 1)
X = np.hstack((X0, X1))
# Points y-coordinate
y = np.array([
3, 4, 5, 5, 2, 4, 7, 8, 11, 8, 12,
11, 13, 13, 16, 17, 18, 17, 19, 21
]).reshape(m, 1)
# The Learning Rate alpha.
alpha = 0.01
def error_function(theta, X, y):
'''Error function J definition.'''
diff = np.dot(X, theta) - y
return (1./2*m) * np.dot(np.transpose(diff), diff)
def gradient_function(theta, X, y):
'''Gradient of the function J definition.'''
diff = np.dot(X, theta) - y
return (1./m) * np.dot(np.transpose(X), diff)
def gradient_descent(X, y, alpha):
'''Perform gradient descent.'''
theta = np.array([1, 1]).reshape(2, 1)
gradient = gradient_function(theta, X, y)
while not np.all(np.absolute(gradient) <= 1e-5):
theta = theta - alpha * gradient
gradient = gradient_function(theta, X, y)
return theta
optimal = gradient_descent(X, y, alpha)
print('optimal:', optimal)
print('error function:', error_function(optimal, X, y)[0,0])
scikit-learn中的随机梯度下降算法
这里需要说明的是,随机梯度下降算法没有对整个数据集进行训练,只是随机选取若干个数据集
# 随机梯度下降算法
from sklearn.linear_model import SGDRegressor
# 这里需要说明的是,它只能解决线性模型,它在线性模型的包里
sgd_reg = SGDRegressor(n_inter = ??) # 这里传入迭代次数
sgd_reg.fit(X_train_standard, y_train) # 这里需要进行标准化
sgd_reg.score(X_test_stardard, y_test)
最后推荐一本线代的书《沉浸式数学》可以直接看,这本书以动态插画的形式来教你学习线性代数
沉浸式数学链接地址