Pandas缺失数据处理

Pandas缺失数据处理

Pandas用np.nan代表缺失数据

import pandas as pd
import numpy as np

dates = pd.date_range('20130101',periods=10)
df = pd.DataFrame(np.random.randn(10,4),index=dates,columns=['A','B','C','D'])
df.head()
ABCD
2013-01-01-0.0315311.231280-1.0692981.068172
2013-01-02-0.2165810.535341-1.4080950.677334
2013-01-030.262541-0.0341650.7120120.053880
2013-01-040.142971-0.009381-0.3695602.142902
2013-01-05-0.4834841.896420-1.087918-0.608670

reindex() 可以修改 索引,会返回一个数据的副本:

df1 = df.reindex(index = dates[0:4], columns = ['A', 'B', 'C', 'D', 'E'])
df1
ABCDE
2013-01-01-0.0315311.231280-1.0692981.068172NaN
2013-01-02-0.2165810.535341-1.4080950.677334NaN
2013-01-030.262541-0.0341650.7120120.053880NaN
2013-01-040.142971-0.009381-0.3695602.142902NaN
df2 = df.reindex(index=dates[0:4], columns=['A','B','C','D']+['E'])
df2
ABCDE
2013-01-01-0.0315311.231280-1.0692981.068172NaN
2013-01-02-0.2165810.535341-1.4080950.677334NaN
2013-01-030.262541-0.0341650.7120120.053880NaN
2013-01-040.142971-0.009381-0.3695602.142902NaN
df3 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])
df3
ABCDE
2013-01-01-0.0315311.231280-1.0692981.068172NaN
2013-01-02-0.2165810.535341-1.4080950.677334NaN
2013-01-030.262541-0.0341650.7120120.053880NaN
2013-01-040.142971-0.009381-0.3695602.142902NaN
df3.loc[dates[0]:dates[1],'E'] = 1
df3
ABCDE
2013-01-01-0.0315311.231280-1.0692981.0681721.0
2013-01-02-0.2165810.535341-1.4080950.6773341.0
2013-01-030.262541-0.0341650.7120120.053880NaN
2013-01-040.142971-0.009381-0.3695602.142902NaN

对缺失值进行填充

df1.fillna(value=5)
ABCDE
2013-01-01-0.0315311.231280-1.0692981.0681725.0
2013-01-02-0.2165810.535341-1.4080950.6773345.0
2013-01-030.262541-0.0341650.7120120.0538805.0
2013-01-040.142971-0.009381-0.3695602.1429025.0
df2['E'] = df1['E'].fillna(value=5)
df2
ABCDE
2013-01-01-0.0315311.231280-1.0692981.0681725.0
2013-01-02-0.2165810.535341-1.4080950.6773345.0
2013-01-030.262541-0.0341650.7120120.0538805.0
2013-01-040.142971-0.009381-0.3695602.1429025.0

丢掉含有缺失项的行:

df3.dropna(how = 'any')
ABCDE
2013-01-01-0.0315311.231280-1.0692981.0681721.0
2013-01-02-0.2165810.535341-1.4080950.6773341.0

对缺失项布尔赋值

df4 = df1.isnull()
df4
ABCDE
2013-01-01FalseFalseFalseFalseTrue
2013-01-02FalseFalseFalseFalseTrue
2013-01-03FalseFalseFalseFalseTrue
2013-01-04FalseFalseFalseFalseTrue
df5 = pd.isnull(df1)
df5
ABCDE
2013-01-01FalseFalseFalseFalseTrue
2013-01-02FalseFalseFalseFalseTrue
2013-01-03FalseFalseFalseFalseTrue
2013-01-04FalseFalseFalseFalseTrue
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值