单变量线性回归

最新推荐文章于 2022-05-13 14:54:58 发布

Saturday66

最新推荐文章于 2022-05-13 14:54:58 发布

阅读量176

点赞数

分类专栏：机器学习文章标签： python 机器学习

本文链接：https://blog.csdn.net/qq_36247562/article/details/108204612

版权

机器学习专栏收录该内容

4 篇文章 1 订阅

订阅专栏

前言

刚开始学机器学习,小白一个,入门听了吴恩达老师的课程,讲的细致很好,学习了numpy,pandas,matplotlib库的一些内容,现在结合学习大佬的代码,实现单变量线性回归(之前也纯理解写了一些实现的代码,比较low).

实现

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
'''
读文件数据
'''
path = "../Doc/ex1data1.txt"
data = pd.read_csv(path,names=['Population','Profit'])
print("数据:\n{}".format(data.head()))
#print(data.describe())
#data.plot(kind = 'scatter',x = 'Population',y = 'Profit',figsize = (8,6))
#plt.show()

'''
代价函数
'''
def computeCost(X,y,theta):
    inner = np.power(((X * theta.T) - y), 2)
    return np.sum(inner) / (2 * len(X))
    #len()  行(hang)
'''
数据初始化
'''
data.insert(0,"Ones",1)# 位置,列,value
cols = data.shape[1] #shape = [97,3]
#print(cols)=3
X = data.iloc[:,0:cols-1]
y = data.iloc[:,cols-1:cols]
print("X矩阵:\n{}\ny向量:\n{}".format(X.head(),y.head()))
X= np.matrix(np.array(X.values))
y = np.matrix(np.array(y.values))
theta = np.matrix(np.array([0,0]))
print(X.shape,y.shape,theta.shape)

'''
批量梯度下降
'''
def gradientDescent(X,y,theta,alpha,iters):#iters梯度下降迭代的次数
    temp = np.matrix(np.zeros(theta.shape))
    parameters = int (theta.ravel().shape[1])
    cost = np.zeros(iters)

    for i in range(iters):
        error = (X * theta.T) - y
        for j in range(parameters):# 0,1,2,...为特征值个数
            term = np.multiply(error,X[:,j])
            temp[0,j] = theta[0,j]-(alpha/len(X))*np.sum(term)
        theta = temp
        cost[i] = computeCost(X,y,theta)
    return theta,cost
'''
main
'''
alpha = 0.01
iters = 1000
g,cost = gradientDescent(X,y,theta,alpha,iters)
print(g)

'''
拟合绘制
'''
x = np.linspace(data.Population.min(),data.Population.max(),100)
f = g[0,0]+g[0,1] * x
fig,ax = plt.subplots(2,1,figsize=(12,10))
ax[0].plot(x,f,'r',label="Prediction")
ax[0].scatter(data.Population,data.Profit,label = "Data")
ax[0].set_xlabel('Population')
ax[0].set_ylabel('Profit')
ax[0].set_title('Predicted Profit vs. Population Size')
ax[0].legend()

'''
代价拟合
'''
ax[1].plot(np.arange(iters),cost,'b')
ax[1].set_xlabel("Itersations")
ax[1].set_ylabel("Cost")
ax[1].set_title("Cost vs. Training")
plt.show()

结果

数据:
   Population   Profit
0      6.1101  17.5920
1      5.5277   9.1302
2      8.5186  13.6620
3      7.0032  11.8540
4      5.8598   6.8233
X矩阵:
   Ones  Population
0     1      6.1101
1     1      5.5277
2     1      8.5186
3     1      7.0032
4     1      5.8598
y向量:
    Profit
0  17.5920
1   9.1302
2  13.6620
3  11.8540
4   6.8233
(97, 2) (97, 1) (1, 2)
[[-3.24140214  1.1272942 ]]

在这里插入图片描述

总结

np.matrix类型的 *乘进行的是矩阵乘法,而array类型的*乘进行的是对应元素相乘(±运算时注意广播).
array.T对于[1,n]是不能进行装置的(不变保持原样[1,n]),而np.matrix类型可以进行.T转置.
np.insert(0,“Ones”,1)# 位置,列名,value.
np.ravel()将多维数组降为一维(平铺).
输出的结果与alpha的值以及iters的值有很大的关系,可能产生不同的效果,过大的alpha会让值发散,过小会使收敛的速度变慢,而迭代次数iters可以根据观察适当的调节.