简单线性回归:
有且仅有一个自变量x,一个因变量y,x、y之间有线性关系,并且y是连续型变量。
假设方程为:
y
=
b
0
+
b
1
x
y=b_0+b_1x
y=b0+b1x
假设有n个样本(即有n个x,n个y)
算法步骤:
- 求出 x ‾ \overline{x} x x ‾ = 1 n ∑ i = 1 n x i \overline{x}=\dfrac{1}{n}\sum_{i=1}^nx_i x=n1i=1∑nxi
- 求出 y ‾ \overline{y} y y ‾ = 1 n ∑ i = 1 n y i \overline{y}=\dfrac{1}{n}\sum_{i=1}^ny_i y=n1i=1∑nyi
- 求出 b 1 b_1 b1 b 1 = ∑ i = 1 n ( x i − x ‾ ) ( y i − y ‾ ) ∑ i = 1 n ( x i − x ‾ ) 2 b_1=\dfrac{\sum_{i=1}^n(x_i-\overline{x})(y_i-\overline{y})}{\sum_{i=1}^n(x_i-\overline{x})^2} b1=∑i=1n(xi−x)2∑i=1n(xi−x)(yi−y)
- 求出 b 0 b_0 b0 b 0 = y ‾ − k x ‾ b_0=\overline{y}-k\overline{x} b0=y−kx
代码(python)
import numpy as np
def fitSLR(x,y):
n = len(x)
numerator = 0
denominator = 0
for i in range(0,n):
numerator +=(x[i]-np.mean(x))*(y[i]-np.mean(y))#np.mean()用于求均值
denominator +=(x[i]-np.mean(x))**2
b1 = numerator/float(denominator)
b0 = np.mean(y)-b1*np.mean(x)
return b0,b1
def predit(x,b0,b1):
return b0+b1*x
x = [1,3,2,1,3]
y = [14,24,18,17,27]
b0,b1 = fitSLR(x, y)
x_test = 6
y_test = predit(x_test, b0, b1)
print(y_test)