机器学习之pandas库②——小项目实战①

最新推荐文章于 2024-07-25 15:49:32 发布

门前大橋下丶

最新推荐文章于 2024-07-25 15:49:32 发布

阅读量1.2k

点赞数

文章标签： pandas 机器学习实战

本文链接：https://blog.csdn.net/weixin_44883371/article/details/98780031

版权

处理后的数据集

   Global_active_power  Global_reactive_power  Global_intensity    b
1                5.360                  0.436              23.0  1.0
2                5.374                  0.498              23.0  1.0
3                5.388                  0.502              23.0  1.0
4                3.666                  0.528              15.8  1.0
5                3.520                  0.522              15.0  1.0
6                3.702                  0.520              15.8  1.0
7                3.700                  0.520              15.8  1.0
8                3.668                  0.510              15.8  1.0
9                3.662                  0.510              15.8  1.0
10               4.448                  0.498              19.6  1.0
11               5.412                  0.470              23.2  1.0
12               5.224                  0.478              22.4  1.0
13               5.268                  0.398              22.6  1.0
14               4.054                  0.422              17.6  1.0
15               3.384                  0.282              14.2  1.0
16               3.270                  0.152              13.8  1.0
17               3.430                  0.156              14.4  1.0
18               3.266                  0.000              13.8  1.0
19               3.728                  0.000              16.4  1.0
20               5.894                  0.000              25.4  1.0

代码及注释

import numpy as np
import pandas as pd

path='E:\PHP\Pycharmprojects\python_ML\ML_Two\datas\household_power_consumption_1000.txt'
df = pd.read_csv(path,sep=';')  #使用pandas加载csv

## 对数据集增加偏置，并去除掉不需要的列
b =  np.ones(len(df))   #建立一列为1的dataframe，起名为b作为偏置项

df['b'] = b             #将偏置项合并到原始dataframe

#获取Global_active_power、Global_reactive_power、Global_intensity和偏置列b的dataframe
a = df[['Global_active_power','Global_reactive_power','Global_intensity','b']]

## 异常数据处理
a = a.replace('?',np.nan).dropna()# 只要有特征为空，就进行删除操作
print(a.head(20))

#将字符串转为浮点
print(a.info())
a['Global_active_power'] = a['Global_active_power'].astype('float64')
print(a.info())


## 分离X和Y
#获取"Global_active_power","Global_reactive_power","b"为X
X = a[['Global_active_power','Global_reactive_power','b']]
Y = a[['Global_intensity']]#获取Global_intensity作为Y


## 将dataframe转为numpy矩阵
X =  np.mat(X)      #将X转为numpy矩阵
Y =  np.mat(Y)      #将Y转为numpy矩阵

##使用正规方程法求解模型参数
theta = np.dot(np.linalg.inv(np.dot(X.T,X)),np.dot(X.T,Y))
print(theta)

效果展示

  Global_active_power  Global_reactive_power  Global_intensity    b
1               5.360                  0.436              23.0  1.0
2               5.374                  0.498              23.0  1.0
3               5.388                  0.502              23.0  1.0
4               3.666                  0.528              15.8  1.0
5               3.520                  0.522              15.0  1.0

<class 'pandas.core.frame.DataFrame'>
Int64Index: 999 entries, 1 to 999
Data columns (total 4 columns):
Global_active_power      999 non-null object
Global_reactive_power    999 non-null float64
Global_intensity         999 non-null float64
b                        999 non-null float64
dtypes: float64(3), object(1)
memory usage: 39.0+ KB
None

<class 'pandas.core.frame.DataFrame'>
Int64Index: 999 entries, 1 to 999
Data columns (total 4 columns):
Global_active_power      999 non-null float64
Global_reactive_power    999 non-null float64
Global_intensity         999 non-null float64
b                        999 non-null float64
dtypes: float64(4)
memory usage: 39.0 KB
None

[[4.10551185]
 [0.68820978]
 [0.35884791]]

门前大橋下丶

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习之pandas库②——小项目实战①

处理后的数据集 Global_active_power Global_reactive_power Global_intensity b1 5.360 0.436 23.0 1.02 5.374 0.498 ...
复制链接

扫一扫