梯度下降(在一个函数(损失函数)中找到一个值x,使得函数y的倒数倒数等于0,(找到一个x,原函数y有极大值和极小值,这里取的是极小值))
- 随机生成一个函数
f ( x ) = ( x − 2.5 ) 2 − 1 f(x) = (x-2.5)^2-1 f(x)=(x−2.5)2−1 - 第一步,求导
f ( x ) ′ = 2 ∗ ( x − 2.5 ) f(x)\prime = 2*(x-2.5) f(x)′=2∗(x−2.5)代码如下
def dJ(theta):
return 2 * (theta - 2.5)
原函数也用代码写出来
def J(theta):
try:
return (theta - 2.5) ** 2 - 1.
except:
return float('inf')
- 第二步:循环找到这样一个值x(theta)
def gradient_descent(initial_theta, eta, n_iters=1e4, epsilon=1e-8):
"""
:param initial_theta: 初始化一个theta
:param eta: 学习率(eta * gradient每次移动的距离)
:param n_iters:总循环次数
:param epsilon:最小差值
:return:
"""
theta = initial_theta
i_iter = 0
theta_history.append(initial_theta)
while i_iter < n_iters:
"""计算梯度(求导)"""
gradient = dJ(theta)
"""记录theta值"""
last_theta = theta
"""每次移动的x,(用起始值-移动的距离eta * gradient)"""
theta = theta - eta * gradient
theta_history.append(theta)
if (abs(J(theta) - J(last_theta)) < epsilon):
break
i_iter += 1
return
def plot_theta_history():
plt.plot(plot_x, J(plot_x))
plt.plot(np.array(theta_history), J(np.array(theta_history)), color="r", marker='+')
plt.show()
eta = 0.01
gradient_descent(0, eta, n_iters=10)
plot_theta_history()
print(theta)
print(J(theta))
结果
定义一个梯度下降法的拟合函数
- 本质使用梯度下降法训练目标函数的值尽可能的小:
1 m ∑ m = 0 m ( y ( i ) − y ^ ( i ) ) 2 \frac{1} m\sum_{m=0}^m(y^{(i)}- \widehat{y}^{(i)})^{2} m1m=0∑m(y(i)−y (i))2
J ( θ ) = M S E ( y , y ^ ) J(\theta)=MSE(y, \widehat{y}) J(θ)=MSE(y,y )- 对这个数进行求导可以得出
Δ J ( θ ) = ( ∂ J / ∂ θ 0 ∂ J / ∂ θ 1 ∂ J / ∂ θ 2 … ∂ J / ∂ θ n ) = 2 m ( ∑ m = 0 m ( X b ( i ) θ − y ( i ) ) ∑ m = 0 m ( X b ( i ) θ − y ( i ) ) ⋅ X 1 ( i ) ∑ m = 0 m ( X b ( i ) θ − y ( i ) ) ⋅ X 2 ( i ) … ∑ m = 0 m ( X b ( i ) θ − y ( i ) ) ⋅ X 1 ( n ) ) \Delta J(\theta)= \begin{pmatrix} \partial J / \partial \theta_{0} \\ \partial J / \partial \theta_{1} \\ \partial J / \partial \theta_{2} \\ \ldots\\ \partial J / \partial \theta_{n} \\ \end{pmatrix}= \frac{2} m\begin{pmatrix} \sum_{m=0}^m(X_{b}^{(i)}\theta- {y}^{(i)}) \\ \sum_{m=0}^m(X_{b}^{(i)}\theta- {y}^{(i)})\cdot X_{1}^{(i)} \\ \sum_{m=0}^m(X_{b}^{(i)}\theta- {y}^{(i)})\cdot X_{2}^{(i)} \\ \ldots\\ \sum_{m=0}^m(X_{b}^{(i)}\theta- {y}^{(i)})\cdot X_{1}^{(n)} \\ \end{pmatrix} ΔJ(θ)=⎝⎜⎜⎜⎜⎛∂J/∂θ0∂J/∂θ1∂J/∂θ2…∂J/∂θn⎠⎟⎟⎟⎟⎞=m2⎝⎜⎜⎜⎜⎜⎛∑m=0m(Xb(i)θ−y(i))∑m=0m(Xb(i)θ−y(i))⋅X1(i)∑m=0m(Xb(i)θ−y(i))⋅X2(i)…∑m=0m(Xb(i)θ−y(i))⋅X1(n)⎠⎟⎟⎟⎟⎟⎞
- 对这个数进行求导可以得出
- 定义求这两个数的函数
def J(theta, X_b, y):
try:
return np.sum(y - X_b.dot(theta)) / len(y)
except:
return float('inf')
def dJ(theta, X_b, y):
res = np.empty(len(theta))
res[0] = np.sum(X_b.dot(theta) - y)
for i in range(1, len(theta)):
res[i] = (X_b.dot(theta) - y).dot(X_b[:, i])
return res * 2 / len(X_b)