1. 线性回归
线性回归算法假设特征和结果满足线性关系。这就意味着可以将输入项分别乘以一些常量,再将结果加起来得到输出。
2. 最小二乘法
线性拟合。将拟合函数取 线性函数或多项 式函数是一种简 单的数据拟合方法。确定线性拟合函数 φ(x)=a+bx, 称为对数据的线性拟合。对于线性拟合问题,需要求函数
S
(
a
,
b
)
=
∑
k
=
1
m
[
(
a
+
b
x
k
)
−
y
k
]
2
S(a,b)=\sum_{k=1}^{m}[(a+bx_{k})-y_{k}]^2
S(a,b)=k=1∑m[(a+bxk)−yk]2
的最小值点。
由函数对两个变量求导数,得
∂
S
∂
a
=
2
∑
k
=
1
m
[
(
a
+
b
x
k
)
−
y
k
]
,
∂
S
∂
a
=
2
∑
k
=
1
m
[
(
a
+
b
x
k
)
−
y
k
]
\frac{\partial S}{\partial a} = 2\sum_{k=1}^{m}[(a+bx_{k})-y_{k}], \\[2ex] \frac{\partial S}{\partial a} = 2\sum_{k=1}^{m}[(a+bx_{k})-y_{k}]
∂a∂S=2k=1∑m[(a+bxk)−yk],∂a∂S=2k=1∑m[(a+bxk)−yk]
令其等于零,得正规方程组
{
m
a
+
∑
k
=
1
m
x
k
b
=
∑
k
=
1
m
y
k
∑
k
=
1
m
x
k
a
+
∑
k
=
1
m
x
k
2
b
=
∑
k
=
1
m
x
k
y
k
\begin{cases} ma + \sum_{k=1}^{m}x_kb = \sum_{k=1}^{m}y_k \\[2ex] \sum_{k=1}^{m}x_ka + \sum_{k=1}^{m}{x}_k^2b = \sum_{k=1}^{m}x_ky_k \\ \end{cases}
⎩⎨⎧ma+∑k=1mxkb=∑k=1myk∑k=1mxka+∑k=1mxk2b=∑k=1mxkyk
转换成矩阵方式
[
m
a
∑
k
=
1
m
x
k
∑
k
=
1
m
x
k
∑
k
=
1
m
x
k
2
]
[
a
b
]
=
[
∑
k
=
1
m
y
k
∑
k
=
1
m
x
k
y
k
]
\begin{bmatrix} ma & \sum_{k=1}^{m}x_k \\ \sum_{k=1}^{m}x_k & \sum_{k=1}^{m}{x}_k^2 \\ \end{bmatrix} \begin{bmatrix} a \\b \\ \end{bmatrix} = \begin{bmatrix} \sum_{k=1}^{m}y_k \\\sum_{k=1}^{m}x_ky_k \\ \end{bmatrix}
[ma∑k=1mxk∑k=1mxk∑k=1mxk2][ab]=[∑k=1myk∑k=1mxkyk]
求出a和b。
类似上面推导,数据的多项式拟合问题中,为了确定拟合函数的系数,需要求解正规方程组
2.1 python 示例
from sklearn import linear_model
import numpy as np
import matplotlib.pyplot as plt
def train(train_x, train_y, train_mode='basic'):
weight = []
if(train_mode=='basic'):
#普通解法
A = np.array([[2, np.sum(train_x)],[np.sum(train_x), np.sum(train_x*train_x)]])
b = np.array([np.sum(train_y), np.sum(train_x*train_y)]).reshape(-1,1)
AI = np.matrix(A).I
bm = np.matrix(b)
w = np.dot(AI,bm).tolist()
print(AI)
print(b)
print(w)
weight.extend([w[1][0], w[0][0]])
print(weight)
elif(train_mode=='scikit-learn'):
#scikit-learn解法
reg = linear_model.LinearRegression()
reg.fit(X, Y)
# y_pre = reg.predict(X)
weight.extend([reg.coef_[0][0], reg.intercept_[0]])
return weight
if __name__ == '__main__':
X = np.array([0, 1, 2, 3, 4, 5]).reshape(-1,1)
Y = np.array([0, 1, 2, 3, 4, 5.1]).reshape(-1,1)
weight = train(X, Y, 'scikit-learn')
plt.scatter(X, Y, color='black')
plt.plot(X,weight[0]*X+weight[1])
plt.show()