数据预处理：缺失值的处理

最新推荐文章于 2024-04-05 18:40:30 发布

云飞扬°

最新推荐文章于 2024-04-05 18:40:30 发布

阅读量756

点赞数 1

分类专栏： Python 文章标签：数据预处理：缺失值的处理

本文链接：https://blog.csdn.net/weixin_44706512/article/details/99975578

版权

Python 专栏收录该内容

23 篇文章 1 订阅

订阅专栏

缺失值标志：NaN（no a number）

缺失值的处理：1-删除dropna 2-填充fillna

1-删除

from pandas import Series
import numpy as np

strSer = Series(['a', 'b', np.nan, 'd', 'e'])
print(strSer)
输出结果：
0      a
1      b
2    NaN
3      d
4      e
dtype: object

# isnull
print(strSer.isnull())
输出结果：
0    False
1    False
2     True
3    False
4    False
dtype: bool

# notnull
print(strSer.notnull())
输出结果：
0     True
1     True
2    False
3     True
4     True
dtype: bool

# 筛选出不为空的项
r = strSer[strSer.notnull()]
print(r)
输出结果：
0    a
1    b
3    d
4    e
dtype: object

# dropna --删除缺失值
r = strSer.dropna()
print(r)
输出结果：
0    a
1    b
3    d
4    e
dtype: object

****************************************************************************************
****************************************************************************************
from pandas import DataFrame
import numpy as np

df = DataFrame([[1.4, np.nan],
                [7.1, -4.5],
                [np.nan, np.nan],
                [0.75, -1.3]],
               index=['a', 'b', 'c', 'd'],
               columns=['one', 'two']
               )
print(df)
输出结果：
    one  two
a  1.40  NaN
b  7.10 -4.5
c   NaN  NaN
d  0.75 -1.3

# 删除含有缺失值的项
r=df.dropna()
print(r)
输出结果：
    one  two
b  7.10 -4.5
d  0.75 -1.3

# 删除全缺失的项
r=df.dropna(how='all')
print(r)
输出结果：
    one  two
a  1.40  NaN
b  7.10 -4.5
d  0.75 -1.3

2-填充

from pandas import DataFrame
import numpy as np

df = DataFrame([[1.4, np.nan],
                [7.1, -4.5],
                [np.nan, np.nan],
                [0.75, -1.3]],
                index=['a', 'b', 'c', 'd'],
                columns=['one', 'two']
               )
print(df)
输出结果：
    one  two
a  1.40  NaN
b  7.10 -4.5
c   NaN  NaN
d  0.75 -1.3

# 填充fillna
r = df.fillna(0)   # 将缺失值用0填充
print(r)
输出结果：
    one  two
a  1.40  0.0
b  7.10 -4.5
c  0.00  0.0
d  0.75 -1.3

# one这一列用0填充,two这一列用1填充
r = df.fillna({'one': 0, 'two': 1})  
print(r)
输出结果：
    one  two
a  1.40  1.0
b  7.10 -4.5
c  0.00  1.0
d  0.75 -1.3

# 以均值填充缺失值
print(df.mean())    # 求出每一列的均值
输出结果：
one    3.083333
two   -2.900000
dtype: float64

r = df.fillna(df.mean())   #每一列的均值填充每一列的缺失值
print(r)
输出结果：
        one  two
a  1.400000 -2.9
b  7.100000 -4.5
c  3.083333 -2.9
d  0.750000 -1.3

云飞扬°

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
数据预处理：缺失值的处理

缺失值标志：NaN（no a number）缺失值的处理：1-删除dropna 2-填充fillna1-删除from pandas import Seriesimport numpy as npstrSer = Series(['a', 'b', np.nan, 'd', 'e'])print(strSer)输出结果：0 a1 b2 N...
复制链接

扫一扫

专栏目录