import numpy as np
import matplotlib.pyplot as plt
x=np.array([1.,2.,3.,4.,5.])
y=np.array([1.,3.,2.,4.,5.])
plt.scatter(x,y)
plt.axis([0,6,0,6])
plt.show()
a = ∑ i = 1 m ( x ( i ) − x ˉ ) ( y ( i ) − y ˉ ) ∑ i = 1 m ( x ( i ) − x ˉ ) 2 a=\frac{\sum_{i=1}^{m}(x^{(i)}-\bar{x})(y^{(i)}-\bar{y})}{\sum_{i=1}^{m}(x^{(i)}-\bar{x})^{2}} a=∑i=1m(x(i)−xˉ)2∑i=1m(x(i)−xˉ)(y(i)−yˉ) b = y ˉ − a x ˉ b=\bar{y}-a\bar{x} b=yˉ−axˉ
x_mean = np.mean(x)
y_mean = np.mean(y)
num = 0.0
d = 0.0
for x_i,y_i in zip(x,y):
num += (x_i - x_mean) * (y_i - y_mean)
d += (x_i - x_mean) ** 2
a = num / d
b = y_mean - a * x_mean
a
运行结果:0.8
b
运行结果:0.39999999999999947
y_hat = a * x + b
plt.scatter(x,y)//散点图
plt.plot(x,y_hat,color='r')//直线
plt.axis([0,6,0,6])
plt.show()
x_predict = 6
y_predict = a * x_predict + b
y_predict
输出结果:5.2
向量化
仅在在上述简单线性回归的步骤中进行修改,将
num = 0.0
d = 0.0
for x_i,y_i in zip(x,y):
num += (x_i - x_mean) * (y_i - y_mean)
d += (x_i - x_mean) ** 2
替换成
num = (x - x_mean).dot(y - y_mean)//把num和d都看成是两个向量的点乘
d = (x - x_mean).dot(x - x_mean)
得到的结果与之前一致,且向量化运算的方法使性能得到了提升