机器学习笔记11_随机梯度下降法
1.随机梯度下降法的思想
1.1 批量梯度下降法
批量梯度下降为:
#
即:
1.2随机梯度下降法
每次只选取一个样本进行梯度下降。
批量梯下降法计算耗时过大,随机梯度法算量小,时间复杂度小。
每次寻找(迭代)改变步长
η
\eta
η,为模拟退火的思想。
其中,a,b为超参数。
2.随机梯度下降法的实现
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import pyplot as plt
# np.random.seed(665)
m = 10000
x = 2 * np.random.normal(size=m)
X = x.reshape(-1,1)
y = x * 3. + 4. + np.random.normal(size=m)
def J(theta, X_b, y):
'''
loss function
'''
try:
return np.sum((y-X_b.dot(theta))**2)/len(X_b)
except:
return float('inf')
def dJ_sgd(theta, X_b_i, y_i):
return X_b_i.T.dot(X_b_i.dot(theta) - y_i) * 2.
def sgd(X_b, y, initial_theta, n_iters):
t0 = 5
t1 = 50
def learning_rate(t):
return t0 / (t + t1)
# 损失函数不一定一直减小,所以只限制迭代次数
theta = initial_theta
for cur_iter in range(n_iters):
rand_i = np.random.randint(len(X_b))
gradient = dJ_sgd(theta, X_b[rand_i], y[rand_i])
theta = theta - learning_rate(cur_iter) * gradient
return theta
%%time
X_b = np.hstack([np.ones((len(X),1)),X])
initial_theta = np.zeros(X_b.shape[1])
theta = sgd(X_b, y, initial_theta, n_iters=len(X_b)//3)
Wall time: 24 ms
theta
array([3.96962099, 2.91398727])
3.scikit-learn中的SGD
from sklearn.linear_model import SGDRegressor
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
boston = datasets.load_boston()
X = boston.data
y = boston.target
X = X[y < 50.0] # 因为上限为50.0,超过50.0的部分也按50算
y = y[y < 50.0]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=1)
standard = StandardScaler()
standard.fit(X_train)
X_train_standard = standard.transform(X_train)
standard.fit(X_test)
X_test_standard = standard.transform(X_test)
sgd_reg = SGDRegressor()
%time sgd_reg.fit(X_train_standard, y_train)
sgd_reg.score(X_test_standard, y_test)
Wall time: 3.99 ms
0.7775560898753987
4总结
- 批量梯度下降法 Batch Gradient Descent
- 随机梯度下降法 Stochastic Descent
- 小批量梯度下降法 Mini-Batch Gradient Descent
随机
- 跳出局部最优解
- 更快的运行速度
- 随机搜索,随机森林