机器学习之pandas库②——小项目实战①

处理后的数据集

   Global_active_power  Global_reactive_power  Global_intensity    b
1                5.360                  0.436              23.0  1.0
2                5.374                  0.498              23.0  1.0
3                5.388                  0.502              23.0  1.0
4                3.666                  0.528              15.8  1.0
5                3.520                  0.522              15.0  1.0
6                3.702                  0.520              15.8  1.0
7                3.700                  0.520              15.8  1.0
8                3.668                  0.510              15.8  1.0
9                3.662                  0.510              15.8  1.0
10               4.448                  0.498              19.6  1.0
11               5.412                  0.470              23.2  1.0
12               5.224                  0.478              22.4  1.0
13               5.268                  0.398              22.6  1.0
14               4.054                  0.422              17.6  1.0
15               3.384                  0.282              14.2  1.0
16               3.270                  0.152              13.8  1.0
17               3.430                  0.156              14.4  1.0
18               3.266                  0.000              13.8  1.0
19               3.728                  0.000              16.4  1.0
20               5.894                  0.000              25.4  1.0

代码及注释

import numpy as np
import pandas as pd

path='E:\PHP\Pycharmprojects\python_ML\ML_Two\datas\household_power_consumption_1000.txt'
df = pd.read_csv(path,sep=';')  #使用pandas加载csv

## 对数据集增加偏置,并去除掉不需要的列
b =  np.ones(len(df))   #建立一列为1的dataframe,起名为b作为偏置项

df['b'] = b             #将偏置项合并到原始dataframe

#获取Global_active_power、Global_reactive_power、Global_intensity和偏置列b的dataframe
a = df[['Global_active_power','Global_reactive_power','Global_intensity','b']]

## 异常数据处理
a = a.replace('?',np.nan).dropna()# 只要有特征为空,就进行删除操作
print(a.head(20))

#将字符串转为浮点
print(a.info())
a['Global_active_power'] = a['Global_active_power'].astype('float64')
print(a.info())


## 分离XY
#获取"Global_active_power","Global_reactive_power","b"X
X = a[['Global_active_power','Global_reactive_power','b']]
Y = a[['Global_intensity']]#获取Global_intensity作为Y


## 将dataframe转为numpy矩阵
X =  np.mat(X)      #将X转为numpy矩阵
Y =  np.mat(Y)      #将Y转为numpy矩阵

##使用正规方程法求解模型参数
theta = np.dot(np.linalg.inv(np.dot(X.T,X)),np.dot(X.T,Y))
print(theta)

效果展示

  Global_active_power  Global_reactive_power  Global_intensity    b
1               5.360                  0.436              23.0  1.0
2               5.374                  0.498              23.0  1.0
3               5.388                  0.502              23.0  1.0
4               3.666                  0.528              15.8  1.0
5               3.520                  0.522              15.0  1.0

<class 'pandas.core.frame.DataFrame'>
Int64Index: 999 entries, 1 to 999
Data columns (total 4 columns):
Global_active_power      999 non-null object
Global_reactive_power    999 non-null float64
Global_intensity         999 non-null float64
b                        999 non-null float64
dtypes: float64(3), object(1)
memory usage: 39.0+ KB
None

<class 'pandas.core.frame.DataFrame'>
Int64Index: 999 entries, 1 to 999
Data columns (total 4 columns):
Global_active_power      999 non-null float64
Global_reactive_power    999 non-null float64
Global_intensity         999 non-null float64
b                        999 non-null float64
dtypes: float64(4)
memory usage: 39.0 KB
None

[[4.10551185]
 [0.68820978]
 [0.35884791]]
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值