一元线性回归
一:机器学习基础
1.什么是机器学习?
从数据中学习
- 建立模型
- 学习模型
- 预测房价
2.机器学习分类
监督学习 (Supervised Learning)
- 通过对数据的学习,寻找属性和标记之间的映射关系
- 回归(regression): 预测连续值
- 分类(classification): 预测离散值
无监督学习 (Unsupervised Learning)
- 在样本数据没有标记的情况下,挖掘出数据内部蕴含的关系
- 聚类:把相似度高的样本聚合在一起
半监督学习 (Semi-Supervised Learning)
- 将有监督学习和无监督学习相结合,综合使用大量的没有标记数据和少量有标记的数据共同进行学习
二:一元线性回归
1.模型
模型:y = wx + b
模型变量:x
w:权重(weights)
b:偏置值
2.最佳拟合直线
3.损失函数/代价函数(cost function)
4.求最值问题
三:一元线性回归实例
问题导入
1.python实现
# python实现房屋销售预测实例
#加载样本数据
x = [137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00, #商品房面积
106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21]
y = [145.00,110.00,93.00,116.00,65.32,104.00,118.00,91.00, #面积
62.00,133.00,51.00,45.00,78.50,69.65,75.69,95.30]
meanX = sum(x)/len(x) #x平均数
meanY = sum(y)/len(y) #y平均数
sumXY = 0
sumX = 0
for i in range(len(x)):
sumXY += (x[i]-meanX)*(y[i]-meanY)
sumX += (x[i]-meanX)*(x[i]-meanX)
w = sumXY / sumX
b = meanY - w*meanX
print(type(w),type(b),"\n")
print("w = ",w,"\nb = ",b,"\n")
#预测房价
x_test = [128.15,45.00,141.43,106.27,99.00,53.84,85.36,70.00]
print("面积\t预测价格")
for i in range(len(x_test)):
print(x_test[i],"\t",round(w*x_test[i]+b,2)) #round保留两位小数
<class 'float'> <class 'float'>
w = 0.8945605120044221
b = 5.410840339418002
面积 预测价格
128.15 120.05
45.0 45.67
141.43 131.93
106.27 100.48
99.0 93.97
53.84 53.57
85.36 81.77
70.0 68.03
2.numpy实现
# numpy实现房屋销售预测实例
import numpy as np
x = np.array([137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00, #商品房面积
106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21])
y = np.array([145.00,110.00,93.00,116.00,65.32,104.00,118.00,91.00, #面积
62.00,133.00,51.00,45.00,78.50,69.65,75.69,95.30])
meanX = np.mean(x)
meanY = np.mean(y)
sumXY = np.sum((x-meanX)*(y-meanY))
sumX = np.sum((x-meanX)*(x-meanX))
w = sumXY/sumX
b = meanY-w*meanX
print(type(w),type(b),"\n")
print("权值:w = ",w,"\n偏置值:b = ",b,"\n")
#预测房价
x_test = np.array([128.15,45.00,141.43,106.27,99.00,53.84,85.36,70.00])
y_test = w*x_test+b
print(y_test)
print("\n面积\t预测价格")
for i in range(len(x_test)):
print(x_test[i],"\t",round(y_test[i],2)) #round保留两位小数
<class 'numpy.float64'> <class 'numpy.float64'>
权值:w = 0.894560512004422
偏置值:b = 5.410840339418002
[120.04876995 45.66606338 131.92853355 100.47578595 93.97233103
53.57397831 81.77052564 68.03007618]
面积 预测价格
128.15 120.05
45.0 45.67
141.43 131.93
106.27 100.48
99.0 93.97
53.84 53.57
85.36 81.77
70.0 68.03
3.tensorflow实现
# tensorflow实现房屋销售预测实例
import tensorflow as tf
import numpy as np
x = tf.constant([137.97,104.50,100.00,124.32,79.20,99.00,124.00,114.00, #商品房面积
106.69,138.05,53.75,46.91,68.00,63.02,81.26,86.21])
y = tf.constant([145.00,110.00,93.00,116.00,65.32,104.00,118.00,91.00, #面积
62.00,133.00,51.00,45.00,78.50,69.65,75.69,95.30])
meanX = tf.reduce_mean(x)
meanY = tf.reduce_mean(y)
sumXY = tf.reduce_sum((x-meanX)*(y-meanY))
sumX = tf.reduce_sum((x-meanX)*(x-meanX))
w = sumXY/sumX
b = meanY-w*meanX
print(type(w),type(b))
print("权值:w = ",w,"\n偏置值b = ",b,)
print("线性模型:y=",w.numpy(),"x+",b.numpy())
x_test = np.array([128.15,45.00,141.43,106.27,99.00,53.84,85.36,70.00])
y_pred = (w*x_test+b).numpy()
print("\n面积\t预测价格")
for i in range(len(x_test)):
print(x_test[i],"\t",round(y_pred[i],2)) #round保留两位小数
<class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'>
权值:w = tf.Tensor(0.8945604, shape=(), dtype=float32)
偏置值b = tf.Tensor(5.4108505, shape=(), dtype=float32)
线性模型:y= 0.8945604 x+ 5.4108505
面积 预测价格
128.15 120.05
45.0 45.67
141.43 131.93
106.27 100.48
99.0 93.97
53.84 53.57
85.36 81.77
70.0 68.03
4.matplotlib数据可视化
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif']=['SimHei'] #中文字体配置
plt.rcParams['axes.unicode_minus']=False #符号显示
plt.figure()
plt.scatter(x,y,color='red',label='销售记录')
plt.scatter(x_test,y_pred,color='blue',label='预测房价')
plt.plot(x_test,y_pred,color='green',label='拟合直线',linewidth=2)
plt.xlabel("面积(平方米)",fontsize=14)
plt.ylabel("价格(万元)",fontsize=14)
plt.xlim((40,150))
plt.ylim((40,150))
plt.suptitle("商品房销售价格评估系统",fontsize=20)
plt.legend(loc="upper left")
plt.show()