pandas的一些函数使用

最新推荐文章于 2023-09-14 20:11:21 发布

wcl1800

最新推荐文章于 2023-09-14 20:11:21 发布

阅读量230

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/wcl1800/article/details/108973391

版权

python 专栏收录该内容

44 篇文章 1 订阅

订阅专栏

一.替换

1.where

根据条件替换dataframe里的值

参数	注解
cond	筛选条件可以为dataframe活series
other	不符合条件的数据默认替换为nan
inplace	True False

df = pd.DataFrame({'col1': [1, 1], 'col2':[3, 1], 'col3':[2, 2]}, index=['row1', 'row2'])

# 符合条件的数据保留, 不符合数据的替换为nan
df.where(df['col2'] > 1)
"""
       col1	col2 col3
row1	1.0	3.0	2.0
row2	NaN	NaN	NaN
"""

# 符合条件的数据保留, 不符合数据的替换为100
df.where(df['col2'] > 1, 100)
"""
     col1	col2 col3
row1	1	3	2
row2	100	100	100
"""

# 筛选全表
df.where(df > 1)
"""
      col1 col2	col3
row1	NaN	3.0	2
row2	NaN	NaN	2
"""

2.mask

与where用法正好相反, where符合筛选条件的留下,其他替换. mask筛选后符合条件的替换,其他留下

二.筛选

1.between

2.query

三.排序

1.rank

对某列排序后, 返回一列排序后的series

参数	注解
method	排序策略 ‘average’, ‘min’, ‘max’, ‘first’, ‘dense’
na_option	对nan值得处理 ‘keep’, ‘top’, ‘bottom’
ascending	升序降序 True False

import pandas as pd

df1 = pd.DataFrame([[1, 2, 3], [3, 4, 1], [1, 2, 1]], columns=["a", "b", "c"])

# 相同数值取平均值, 排在其后面的数值从第三位起始 
df1['a'].rank(method='average')
"""
0    1.5
1    3.0
2    1.5
Name: a, dtype: float64
"""

# min取最小值, max取最大值
df1['a'].rank(method='min')
"""
0    1.0
1    3.0
2    1.0
Name: a, dtype: float64
"""

# 数值相同时, 按照数值出现的顺序排名
df1['a'].rank(method='first')
"""
0    1.0
1    3.0
2    2.0
Name: a, dtype: float64
"""

# 数值相同值, 排序相同, 排在其后的数值从第二位起始
df1['a'].rank(method='dense')
"""
0    1.0
1    2.0
2    1.0
Name: a, dtype: float64
"""

import pandas as pd

df1 = pd.DataFrame([[1, 2], [3, 4, 1], [1, 2, 1]], columns=["a", "b", "c"])
 
 # nan值保持原本得位置不变
df1['c'].rank(method='first', na_option='keep')
"""
0    NaN
1    1.0
2    2.0
Name: c, dtype: float64
"""

# nan值排在第一位
df1['c'].rank(method='first', na_option='top')
"""
0    1.0
1    2.0
2    3.0
Name: c, dtype: float64
"""

# nan值排在末尾
df1['c'].rank(method='first', na_option='bottom')
"""
0    3.0
1    1.0
2    2.0
Name: c, dtype: float64
"""

四.循环

1.iteritems

返回两个对象, 1.index行索引 2.行series

2.iterrows

返回两个对象, 1.columns列索引, 2.列series

3.itertuples

返回一个tuples, 元组内0元素是行索引, 后面依次为行数据

五.整理

1.align

整理两个包含相同列,行索引的dataframe

参数	注解
join	合并策略 ‘outer’, ‘inner’, ‘left’, ‘right’
axis	0行, 1列
fill_value	填充nan
method	默认None, ‘backfill’, ‘bfill’, ‘pad’, ‘ffill’
limit	如果指定了method,则limit指定填充的行数, 默认None

df1 = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'], index=[5, 6])
df2 = pd.DataFrame([[1, 2], [3, 4]], columns=['b', 'c'], index=[6, 5])

df1, df2 = df1.align(df2, join='outer', axis=1, fill_value='填充')
print(df1)
print('*******分隔符*******')
print(df2)
"""
   a  b   c
5  1  2  填充
6  3  4  填充
*******分隔符*******
    a  b  c
6  填充  1  2
5  填充  3  4
"""

六.查询

1.all

查看行,列是否全部值都为真, 返回布尔值

df = pd.DataFrame({'col1': [True, True], 'col2':[True, False]})

# 返回每列布尔值, 默认axis=0
df.all(axis=0)
"""
col1     True
col2    False
dtype: bool
"""

# 返回每行布尔值
df.all(axis=1)
"""
0     True
1    False
dtype: bool
"""

# 返回全表是否全部为真
df.all(axis=None)
"""
False
"""

2.any

查看行,列是否全有真的值, 返回布尔值
参照all函数

3.nunique

查询dataframe中行,列有多少个唯一值

参数	注解
axis	按照行,列查询结果, 默认axis=0 列, axis=1 行
dropna	是否忽略nan值, 默认True, False

import pandas as pd

df1 = pd.DataFrame([[1, 1, 1], [2, 1], [3, 2, 1], [1, 5]], columns=["a", "b", "c"])

# 按照列统计数据
df1.nunique(axis=0, dropna=False)
"""
a    3
b    3
c    2
dtype: int64
"""

df1.nunique(axis=1, dropna=False)
"""
0    1
1    3
2    3
3    3
dtype: int64
"""

七.新增

1.assign

新增多列数据

df = pd.DataFrame({'col1': [True, True], 'col2':[True, False]})
df.assign(col3=[1, 2], col4=[3, 4])
"""
   col1   col2  col3  col4
0  True   True     1     3
1  True  False     2     4
"""

wcl1800

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
pandas的一些函数使用

一.替换1.where二.筛选1.between2.query三.排序1.rank对某列排序后, 返回一列排序后的series参数注解method排序策略 ‘average’, ‘min’, ‘max’, ‘first’, ‘dense’na_option对nan值得处理 ‘keep’, ‘top’, ‘bottom’ascending升序降序 True Falseimport pandas as pddf1 = pd.DataFrame([[
复制链接

扫一扫