回归问题
1.连续值预测
结果在实数范围内给出一个值
f
θ
:
x
→
y
f_{\theta}:x → y
fθ:x→y其中f(x)
是预测结果,x
是输入数据,y
是真实结果
连续值预测–例子
生成数据
首先用下面的函数生成一些数据,然后用这些数据来预测这个函数中的参数1.477
和0.089
。其中参数
ϵ
\epsilon
ϵ主要是使数据不那么规律,更加逼近真实情况中的数据
y
=
1.477
×
x
+
0.089
+
ε
y = 1.477 \times x + 0.089 + \varepsilon
y=1.477×x+0.089+εcsv文件数据
https://pan.baidu.com/s/17rrRnvznVhoqtwecfFfnhQ 提取码:wjl0
预测参数
从现在起,利用数据和数据图像预测函数关系。
根据数据的图像,假设:x 和 y 的关系是一次函数
y
=
w
x
+
b
y = wx+b
y=wx+b ,其中
w
w
w 是斜率,
b
b
b 是偏置。
为解出
w
w
w 和
b
b
b 的值,需要构造一个损失函数
l
o
s
s
=
∑
i
=
1
n
(
w
∗
x
i
+
b
−
y
i
)
2
loss = \sum_{i=1}^n (w*x_i+b-y_i)^2
loss=i=1∑n(w∗xi+b−yi)2当loss的值越小,预测值越准确。
那么现在的问题就转换成了如何使得loss的值达到最小。
这里会用到梯度下降法(Gradient Descent),暂时先直接用,日后再补充进来。其中
l
r
lr
lr 表示学习率 learning rate。
w
′
=
w
−
l
r
∗
∂
l
o
s
s
∂
w
w' = w - lr*\frac{\partial loss}{\partial w}
w′=w−lr∗∂w∂loss
b
′
=
b
−
l
r
∗
∂
l
o
s
s
∂
b
b'=b-lr*\frac{\partial loss}{\partial b}
b′=b−lr∗∂b∂loss 根据上述的公式,计算预测值
y
y
y
w
′
∗
x
+
b
′
→
y
w'*x+b' \rightarrow y
w′∗x+b′→y经过不断的迭代,最终会得到一个最优化的
w
′
w'
w′ 和
b
′
b'
b′ 。这个过程其实是《统计学习方法(第二版)》中感知机学习算法的原始形式。
现在来考虑求解过程。
- 计算loss值:
l
o
s
s
=
∑
i
=
0
n
(
w
⋅
x
i
+
b
−
y
i
)
2
loss=\sum_{i=0}^n(w\cdot x_i+b-y_i)^2
loss=i=0∑n(w⋅xi+b−yi)2
- 计算梯度和更新的参数值
l
o
s
s
=
(
w
x
1
+
b
−
y
1
)
2
+
(
w
x
2
+
b
−
y
2
)
2
+
⋅
⋅
⋅
+
(
w
x
n
+
b
−
y
n
)
2
loss=(wx_1+b-y_1)^2 + (wx_2+b-y_2)^2+···+(wx_n+b-y_n)^2
loss=(wx1+b−y1)2+(wx2+b−y2)2+⋅⋅⋅+(wxn+b−yn)2 求 loss 关于
w
w
w 的偏导
∂
l
o
s
s
∂
w
=
2
(
w
x
1
+
b
−
y
1
)
x
1
+
2
(
w
x
2
+
b
−
y
2
)
x
2
+
⋅
⋅
⋅
+
2
(
w
x
n
+
b
−
y
n
)
x
n
\frac {\partial loss}{\partial w}=2(wx_1+b-y_1)x_1 + 2(wx_2+b-y_2)x_2+···+2(wx_n+b-y_n)x_n
∂w∂loss=2(wx1+b−y1)x1+2(wx2+b−y2)x2+⋅⋅⋅+2(wxn+b−yn)xn
∂
l
o
s
s
∂
w
=
2
∑
i
=
1
n
(
w
x
i
+
b
−
y
i
)
x
i
\frac {\partial loss}{\partial w} = 2\sum_{i=1}^{n}(wx_i+b-y_i)x_i
∂w∂loss=2i=1∑n(wxi+b−yi)xi求 loss 关于
b
b
b 的偏导
∂
l
o
s
s
∂
b
=
2
(
w
x
1
+
b
−
y
1
)
+
2
(
w
x
2
+
b
−
y
2
)
+
⋅
⋅
⋅
+
2
(
w
x
n
+
b
−
y
n
)
\frac{\partial loss}{\partial b}=2(wx_1+b-y_1)+2(wx_2+b-y_2)+···+2(wx_n+b-y_n)
∂b∂loss=2(wx1+b−y1)+2(wx2+b−y2)+⋅⋅⋅+2(wxn+b−yn)
∂
l
o
s
s
∂
b
=
2
∑
i
=
1
n
(
w
x
i
+
b
−
y
i
)
\frac{\partial loss}{\partial b} = 2\sum_{i=1}^n (wx_i+b-y_i)
∂b∂loss=2i=1∑n(wxi+b−yi)
w
w
w的更新计算
w
′
=
w
−
l
r
∗
∂
l
o
s
s
∂
w
w'=w-lr*\frac {\partial loss}{\partial w}
w′=w−lr∗∂w∂loss
b
b
b的更新计算
b
′
=
b
−
l
r
∗
∂
l
o
s
s
∂
b
b'=b-lr*\frac{\partial loss}{\partial b}
b′=b−lr∗∂b∂loss
import numpy as np
# y = wx + b
def compute_error_for_line_given_points(b, w, points):
totalError = 0
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
# computer mean-squared-error
totalError += (y - (w * x + b)) ** 2
# average loss for each point
return totalError / float(len(points))
def step_gradient(b_current, w_current, points, learningRate):
"""梯度更新w和b的值"""
b_gradient = 0
w_gradient = 0
N = float(len(points))
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
# grad_b = 2(wx + b - y)
b_gradient += (2 / N) * ((w_current * x + b_current) - y)
# grad_w = 2(wx + b - y) * x
w_gradient += (2 / N) * x * ((w_current * x + b_current) - y)
# update w'
new_b = b_current - (learningRate * b_gradient)
new_w = w_current - (learningRate * w_gradient)
return new_b, new_w
def gradient_descent_runner(points, starting_b, starting_w, learning_rate, num_iterations):
b = starting_b
w = starting_w
# update for several times
for i in range(num_iterations):
b, w = step_gradient(b, w, np.array(points), learning_rate)
return [b, w]
def run():
points = np.genfromtxt("data.csv", delimiter=",")
print(type(points), points.shape)
learning_rate = 0.0001
initial_b = 0 # initial y-intercept guess
initial_w = 0 # initial slope guess
num_iterations = 1000
print("Starting gradient descent at b = {0}, w = {1}, error = {2}"
.format(initial_b, initial_w, compute_error_for_line_given_points(initial_b, initial_w, points)))
print("Running...")
[b, w] = gradient_descent_runner(points, initial_b, initial_w, learning_rate, num_iterations)
print("After {0} iterations b = {1}, w = {2}, error = {3}".
format(num_iterations, b, w, compute_error_for_line_given_points(b, w, points)))
if __name__ == '__main__':
run()
输出:
Starting gradient descent at b = 0, w = 0, error = 5565.107834483211
Running...
After 1000 iterations b = 0.08893651993741346, w = 1.4777440851894448, error = 112.61481011613473