Pandas-进阶应用

最新推荐文章于 2022-03-29 18:57:54 发布

餐霞散人

最新推荐文章于 2022-03-29 18:57:54 发布

阅读量767

点赞数

分类专栏： python pandas AI 之路

本文链接：https://blog.csdn.net/qq_27171347/article/details/81475377

版权

本文详细介绍了Pandas库在数据处理中的高级应用，包括数据运算与算术对齐，使用iloc与loc进行索引切片，DataFrame与Series的运算，apply和applymap函数在数据映射上的应用，Series和DataFrame的排序，处理重复索引，计算统计描述，以及处理唯一值、值计数和缺失值的方法。

摘要由CSDN通过智能技术生成

1 pandas中的数据运算与算术对齐
2 iloc与loc的切片与索引
3 DataFrame与Series之间的运算
4 函数应用和映射
- 4.1 用apply将一个规则应用到DataFrame的行或者列上
- 4.2 applymap 将一个规则应用到DataFrame中的每一个元素
5 Series和DataFrame的排序
6 处理Series的重复索引
7 汇总计算描述统计
8 唯一值、值计数与成员资格
9 缺失值处理

1 pandas中的数据运算与算术对齐

pandas可以对不同索引的对象进行算术运算。在将对象相加时,如果存在不同的索引|对,则结果的索引就是该索引对的并集。在对不同索引的对象进
行算术运算时,当一个对象中某个轴标签在另一个对象中找不到时,会自动填充NaN,也可自己填充一个特殊值(比如0)

from pandas import Series,DataFrame
import pandas as pd
import numpy as np
from numpy import nan

df1 = DataFrame(np.arange(12).reshape((3,4)),columns=list("abcd"))
df2 = DataFrame(np.arange(20).reshape((4,5)),columns=list("abcde"))
df1

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	a	b	c	d
0	0	1	2	3
1	4	5	6	7
2	8	9	10	11

df2

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	a	b	c	d	e
0	0	1	2	3	4
1	5	6	7	8	9
2	10	11	12	13	14
3	15	16	17	18	19

df1.add(df2,fill_value=0)  # 为df1添加第3行和e这一列，并将其填充为0

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	a	b	c	d	e
0	0.0	2.0	4.0	6.0	4.0
1	9.0	11.0	13.0	15.0	9.0
2	18.0	20.0	22.0	24.0	14.0
3	15.0	16.0	17.0	18.0	19.0

df1.add(df2).fillna(0)    # 按照正常方式将df1和df2相加，然后将NaN值填充为0

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	a	b	c	d
0	0.0	2.0	4.0	6.0
1	9.0	11.0	13.0	15.0
2	18.0	20.0	22.0	24.0
3	0.0	0.0	0.0	0.0

'''
注意：df1.add(df2)，
df1.add(df2,fill_value=0)，
df1.add(df2).fillna(0)
本质上不同
'''

2 iloc与loc的切片与索引

loc，基于label的索引
iloc，完全基于位置的索引

frame = DataFrame(np.arange(12).reshape((4,3)),
                  columns=list("bde"),
                 index=["Utah","Ohio","Texas","Oregon"])
frame

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	b	d	e
Utah	0	1	2
Ohio	3	4	5
Texas	6	7	8
Oregon	9	10	11

frame.iloc[1]  # 获取某一行数据  用iloc[]  替换ix[] 方法

    b    3
    d    4
    e    5
    Name: Ohio, dtype: int32

frame.index

    Index(['Utah', 'Ohio', 'Texas', 'Oregon'], dtype='object')

# 根据行索引提取数据
frame.loc["Oregon"]

    b     9
    d    10
    e    11
    Name: Oregon, dtype: int32

# DataFrame和Series进行算术运算
frame

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	b	d	e
Utah	0	1	2
Ohio	3	4	5
Texas	6	7	8
Oregon	9	10	11

series = frame.iloc[0]   # frame.loc["Utah"]
series

    b    0
    d    1
    e    2
    Name: Utah, dtype: int32

3 DataFrame与Series之间的运算

默认情况下,Dataframe和 Series之间的算术运算会将Series的索引匹配到Dataframe的列,然后沿着行一直向下广播

frame - series

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }