机器学习01——线性回归

最新推荐文章于 2024-08-03 17:51:46 发布

zhuzhiyi1211

最新推荐文章于 2024-08-03 17:51:46 发布

阅读量134

点赞数 1

文章标签：机器学习线性回归人工智能

本文链接：https://blog.csdn.net/zhuzhiyi1211/article/details/137600596

版权

1.绘制图片

import pandas as pd
import numpy as np
# matplotlib.pyplot是一些命令行风格函数的集合，使matplotlib以类似于MATLAB的方式工作。
import matplotlib.pyplot as plt
import seaborn as sns
import chardet
sns.set(context="notebook", style="whitegrid", palette="deep")

#单位对角矩阵
A=np.eye(5)
print(A)


#pd.read_csv的作用是将csv文件读入并转化为数据框形式，有非常多的参数，用到时可查阅文档。
#pd.read_csv() 是 Pandas 库
print('\n')
with open('ex1data1.txt', 'rb') as f:
    result = chardet.detect(f.read())
print(result)
df = pd.read_csv('ex1data1.txt', names=['人口', '利润'], sep=',', encoding=result['encoding'])

#读前5行
#括号内可填写要读取的前n行，如果不填，默认为n=5
print('\n')
print('#括号内可填写要读取的前n行，如果不填，默认为n=5')
df.head()
print(df.head(10))

#查看索引、数据类型和内存信息
print('\n')
print('#查看索引、数据类型和内存信息')
df.info()

##通过绘制散点图来观察原始数据
#fit_reg:拟合回归参数,如果fit_reg=True则散点图中则出现拟合直线
sns.lmplot(x='人口', y='利润', data=df, height=6, fit_reg=True)
plt.show()

2.代价函数和梯度下降

#导入库
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import chardet
import seaborn as sns
sns.set(context="notebook", style="whitegrid", palette="deep")

#导入数据集
#note:ex1data1.txt包含我们的线性回归问题的数据集。
# 第一列是一个城市的人口第二列是该城市一辆快餐车的利润。
# 利润为负数表示亏损。
with open('ex1data1.txt', 'rb') as f:
    result = chardet.detect(f.read())
df = pd.read_csv('ex1data1.txt', names=['population', 'profit'], sep=',', encoding=result['encoding'])

#画出预先散点图
sns.lmplot(x='population', y='profit', data=df, height=6, fit_reg=True)
plt.show()


#代价函数
def cost_fun(X,y,theta):
     inner=(np.dot(X,theta)-y)**2
     m=len(X)
     return np.sum(inner)/2*m
#令x0=1
df.insert(0,'Ones',1)
# print(data)
X=df.iloc[:,:-1].values
print('\n X:')
print(X)
y=df.iloc[:,-1].values
#初始化theta
theta=np.zeros(X.shape[1])


#梯度下降
def gradient(X,y,alpha,theta):
    m = len(X)
    for i in range(1000):
        theta=theta-(alpha/m)*np.dot(X.T,(np.dot(X,theta)-y))
    return theta

theta=gradient(X,y,0.01,theta)
print('\n theta:')
print(theta)
fig = plt.figure()
ax= plt.axes(projection='3d')
# fig,ax=plt.subplots(figsize=(16,9))
x=np.linspace(df.population.min(),df.population.max(),100)#设置x轴数据集，显示population的min->max,显示100个点
y=theta[0]+theta[1]*x
ax.plot(x,y,'r',label='Prediction')
ax.scatter(df.population,df.profit,label='Traning Data')#画点
ax.legend(loc=2)##点和线的图例，2表示在左上角。不写这句的话图例出现不了
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('single_linear_regression')
plt.show()

详细说明：

这段代码使用了 Python 的数据分析与可视化库 pandas、numpy、matplotlib 和 seaborn 来进行简单的线性回归分析。下面我会对代码中的每一部分进行详细解释：

1. 导入库：
在这部分中，导入了 pandas 用于数据处理，numpy 用于数值计算，matplotlib 和 seaborn 用于数据可视化。同时，使用了 `sns.set` 方法来设置 seaborn 的绘图风格。

   import pandas as pd
   import numpy as np
   import matplotlib.pyplot as plt
   from mpl_toolkits.mplot3d import Axes3D
   import chardet
   import seaborn as sns
   sns.set(context="notebook", style="whitegrid", palette="deep")

2. 导入数据集：
这段代码打开数据文件，并使用 chardet 库检测文件的编码格式，然后使用 pandas 的 `read_csv` 方法读取文件数据到 DataFrame 中。

   with open('ex1data1.txt', 'rb') as f:
       result = chardet.detect(f.read())
   df = pd.read_csv('ex1data1.txt', names=['population', 'profit'], sep=',', encoding=result['encoding'])

3. 画出预先散点图：
这里使用 seaborn 的 `lmplot` 方法画出了数据集中人口和利润的散点图，并进行了线性拟合。

   sns.lmplot(x='population', y='profit', data=df, height=6, fit_reg=True)

4. 代价函数：
这是代价函数的定义，用来计算线性回归模型的代价。

   def cost_fun(X,y,theta):
       inner=(np.dot(X,theta)-y)**2
       m=len(X)
       return np.sum(inner)/2*m

5. 设置 x0=1，初始化 X 和 y：
在数据集中插入一列全为 1 的列，在这里代表 x0=1。然后初始化训练数据 X 和目标变量 y。

   df.insert(0,'Ones',1)
   X=df.iloc[:,:-1].values
   y=df.iloc[:,-1].values

6. 初始化 theta：
通过创建一个元素全为 0 的 theta，用来作为梯度下降的初始参数。

   theta=np.zeros(X.shape[1])

7. 梯度下降：
定义了梯度下降函数，通过多次迭代更新参数 theta。

   def gradient(X,y,alpha,theta):
       m = len(X)
       for i in range(1000):
           theta=theta-(alpha/m)*np.dot(X.T,(np.dot(X,theta)-y))
       return theta

8. 绘制预测结果的曲线和散点图：
这段代码使用 matplotlib 绘制了线性回归模型的预测曲线和原始数据的散点图，可以直观地展示模型的拟合效果。

   fig = plt.figure()
   ax= plt.axes(projection='3d')
   x=np.linspace(df.population.min(),df.population.max(),100)
   y=theta[0]+theta[1]*x
   ax.plot(x,y,'r',label='Prediction')
   ax.scatter(df.population,df.profit,label='Traning Data')
   ax.set_xlabel('Population')
   ax.set_ylabel('Profit')
   ax.set_title('single_linear_regression')
   plt.show()

zhuzhiyi1211

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
机器学习01——线性回归

这段代码打开数据文件，并使用 chardet 库检测文件的编码格式，然后使用 pandas 的 `read_csv` 方法读取文件数据到 DataFrame 中。这段代码使用 matplotlib 绘制了线性回归模型的预测曲线和原始数据的散点图，可以直观地展示模型的拟合效果。这里使用 seaborn 的 `lmplot` 方法画出了数据集中人口和利润的散点图，并进行了线性拟合。通过创建一个元素全为 0 的 theta，用来作为梯度下降的初始参数。这是代价函数的定义，用来计算线性回归模型的代价。
复制链接

扫一扫