深度学习(2)回归问题
一. 问题提出与解析
1. Machine Learning
- make decisions
- going left/right → \to → discrete
- increase/decrease → \to → continuous
2. Continuous Prediction
- f θ : x → y f_θ:x→y fθ:x→y
- x : i n p u t d a t a x:input data x:inputdata
- f ( x ) : p r e d i c t i o n f(x):prediction f(x):prediction
- y : r e a l d a t a , g r o u n d − t r u t h y:real data,ground-truth y:realdata,ground−truth
3. Linear Equation
- y=w*x+b
- 1.567=w*1+b
- 3.043=w*2+b
→ \to → Closed Form Solution
- w=1.477
- b=0.089
4. With Noise?
- y=w*x+b+ϵ
- ϵ ~ N(0,1)
- 1.567=w*1+b+eps
- 3.043=w*2+b+eps
- 4.519=w*2+b+eps
- …
→ \to → - Y=(WX+b)
For Example
- w?
- b?
5. Find w ′ w' w′, b ′ b' b′
- [ ( W X + b − Y ) ] 2 [(WX+b-Y)]^2 [(WX+b−Y)]2
- l o s s = ∑ i ( w ∗ x i + b − y i ) 2 loss=\sum_i{(w*x_i+b-y_i)^2} loss=∑i(w∗xi+b−yi)2
- M i n i m i z e l o s s Minimize\ loss Minimize loss
- w ′ ∗ x + b ′ → y w'*x+b'→y w′∗x+b′→y
6. Gradient Descent
(1) 1-D
w
′
=
w
′
−
l
r
∗
d
y
d
w
w'=w'-lr*\frac{dy}{dw}
w′=w′−lr∗dwdy
x
′
=
x
−
0.005
∗
d
y
d
w
x'=x-0.005*\frac{dy}{dw}
x′=x−0.005∗dwdy
可以看到,函数的导数始终指向函数值变大的方向,因此,如果要求
l
o
s
s
loss
loss函数的极小值的话,就需要沿导数的反方向前进,即
−
l
r
∗
d
y
d
w
-lr*\frac{dy}{dw}
−lr∗dwdy,衰减因子
l
r
lr
lr的引入是为了防止步长变大,跨度太大。
(2) 2-D
Find
w
′
,
b
′
w',b'
w′,b′
- l o s s = ∑ i ( w ∗ x i + b − y i ) 2 loss=\sum_i{(w*x_i+b-y_i)^2} loss=∑i(w∗xi+b−yi)2
- 分别对w和b求偏导数,然后沿着偏导数的反向前进,即:
- w ′ = w − l r ∗ ∂ l o s s ∂ w w'=w-lr*\frac{∂loss}{∂w} w′=w−lr∗∂w∂loss
- b ′ = b − l r ∗ ∂ l o s s ∂ b b'=b-lr*\frac{∂loss}{∂b} b′=b−lr∗∂b∂loss
- w ′ ∗ x + b ′ → y w'*x+b'→y w′∗x+b′→y
Learning Process
Loss surface
二. 回归问题实战
1. 步骤
(1) 根据随机初始化的
w
,
x
,
b
,
y
w,x,b,y
w,x,b,y的数值来计算
L
o
s
s
F
u
n
c
t
i
o
n
Loss\ Function
Loss Function;
(2) 根据当前的
w
,
x
,
b
,
y
w,x,b,y
w,x,b,y的值来计算梯度;
(3) 更新梯度,将
w
′
w'
w′赋值给
w
w
w,如此往复循环;
(4) 最后面的
w
′
w'
w′和
b
′
b'
b′就会作为模型的参数。
2. Step1: Compute Loss
共有100个点,每个点有两个维度,所以数据集维度为
[
100
,
2
]
[100,2]
[100,2],按照
[
(
x
0
,
y
0
)
,
(
x
1
,
y
1
)
,
…
,
(
x
99
,
y
99
)
]
[(x_0,y_0 ),(x_1,y_1 ),…,(x_{99},y_{99} )]
[(x0,y0),(x1,y1),…,(x99,y99)]排列,则损失函数为:
l
o
s
s
=
[
(
w
0
x
0
+
b
0
−
y
0
)
]
2
+
[
(
w
0
x
1
+
b
0
−
y
1
)
]
2
+
⋯
+
[
(
w
0
x
99
+
b
0
−
y
99
)
]
2
loss=[(w_0 x_0+b_0-y_0)]^2+[(w_0 x_1+b_0-y_1)]^2+⋯+[(w_0 x_{99}+b_0-y_{99})]^2
loss=[(w0x0+b0−y0)]2+[(w0x1+b0−y1)]2+⋯+[(w0x99+b0−y99)]2
即:
l
o
s
s
=
∑
i
(
w
∗
x
i
+
b
−
y
i
)
2
loss=\sum_i(w*x_i+b-y_i)^2
loss=i∑(w∗xi+b−yi)2
初始值设
w
0
=
b
0
=
0
w_0=b_0=0
w0=b0=0。
(1) b和w的初始值都为0,points是传入的100个点,是data.csv里的数据;
(2) len(points)
就是传入数据点的个数,即100; range(0, len(points))
就代表从0循环到100;
(3) x=points[i, 0]
表示取第i个点中的第0个值,即第一个元素,相当于p[i][0]
; 同理,y=points[i, 1]
表示取第i个点中的第1个值,即第二个元素,相当于p[i][1]
;
(4) totalError
为总损失值,除以是len(points)
是平均损失值。
3. Step2: Compute Gradient and update
l
o
s
s
0
=
(
w
x
0
+
b
−
y
0
)
2
loss_0=(wx_0+b-y_0)^2
loss0=(wx0+b−y0)2
∂
l
o
s
s
0
∂
w
=
2
(
w
x
0
+
b
−
y
0
)
x
0
\frac{∂loss_0}{∂w}=2(wx_0+b-y_0)x_0
∂w∂loss0=2(wx0+b−y0)x0
∂
l
o
s
s
∂
w
=
2
∑
(
w
x
i
+
b
−
y
i
)
x
i
\frac{∂loss}{∂w}=2\sum(wx_i+b-y_i)x_i
∂w∂loss=2∑(wxi+b−yi)xi
∂
l
o
s
s
∂
b
=
2
∑
(
w
x
i
+
b
−
y
i
)
\frac{∂loss}{∂b}=2\sum(wx_i+b-y_i)
∂b∂loss=2∑(wxi+b−yi)
w
′
=
w
−
l
r
∗
∂
l
o
s
s
∂
w
w'=w-lr*\frac{∂loss}{∂w}
w′=w−lr∗∂w∂loss
b
′
=
b
−
l
r
∗
∂
l
o
s
s
∂
b
b'=b-lr*\frac{∂loss}{∂b}
b′=b−lr∗∂b∂loss
4. Step3: Set w = w ′ w=w' w=w′and loop
w
←
w
′
w←w'
w←w′
b
←
b
′
b←b'
b←b′
计算出最终的w和b的值就可以带入模型进行预测了:
w
′
x
+
b
′
→
p
r
e
d
i
c
t
w' x+b'→predict
w′x+b′→predict
5. 代码
import numpy as np
# y = wx + b
def compute_error_for_line_given_points(b, w, points):
totalError = 0
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
# computer mean-squared-error
totalError += (y - (w * x + b)) ** 2
# average loss for each point
return totalError / float(len(points))
def step_gradient(b_current, w_current, points, learningRate):
b_gradient = 0
w_gradient = 0
N = float(len(points))
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
# grad_b = 2(wx+b-y)
b_gradient += (2 / N) * ((w_current * x + b_current) - y)
# grad_w = 2(wx+b-y)*x
w_gradient += (2 / N) * x * ((w_current * x + b_current) - y)
# update w'
new_b = b_current - (learningRate * b_gradient)
new_w = w_current - (learningRate * w_gradient)
return [new_b, new_w]
def gradient_descent_runner(points, starting_b, starting_w, learning_rate, num_iterations):
b = starting_b
w = starting_w
# update for several times
for i in range(num_iterations):
b, w = step_gradient(b, w, np.array(points), learning_rate)
return [b, w]
def run():
points = np.genfromtxt("data.csv", delimiter=",")
learning_rate = 0.0001
initial_b = 0 # initial y-intercept guess
initial_w = 0 # initial slope guess
num_iterations = 1000
print("Starting gradient descent at b = {0}, w = {1}, error = {2}"
.format(initial_b, initial_w,
compute_error_for_line_given_points(initial_b, initial_w, points))
)
print("Running...")
[b, w] = gradient_descent_runner(points, initial_b, initial_w, learning_rate, num_iterations)
print("After {0} iterations b = {1}, w = {2}, error = {3}".
format(num_iterations, b, w,
compute_error_for_line_given_points(b, w, points))
)
if __name__ == '__main__':
run()
运行结果如下:
可以看到,在
w
=
0
,
b
=
0
w=0,b=0
w=0,b=0的时候,损失值
e
r
r
o
r
≈
5565.11
error≈5565.11
error≈5565.11;
在1000轮迭代后,
w
≈
1.48
,
b
≈
0.09
w≈1.48,b≈0.09
w≈1.48,b≈0.09,损失值
e
r
r
o
r
≈
112.61
error≈112.61
error≈112.61,要大大小于原来的损失值。
参考文献:
[1] 龙良曲:《深度学习与TensorFlow2入门实战》