Pandas索引,Pandas分组计算,聚合函数,数据IO,时间序列,重采样,数据可视化

本文深入探讨了Pandas库的索引操作,包括Series和DataFrame的单级及多级索引,以及如何处理重复索引。详细介绍了Pandas的分组计算,通过列表、字典、函数等方式进行分组,并应用内置和自定义聚合函数。还涵盖了数据的导入导出、时间序列处理和重采样。最后,文章展示了数据可视化的多种图表,如折线图、柱状图、直方图和散点图,帮助理解数据分布。
摘要由CSDN通过智能技术生成

Pandas索引

Series索引
In [379]: a = pd.Series(np.random.rand(5),index=list('abcde'))

In [380]: a
Out[380]:
a    0.716350
b    0.922494
c    0.739640
d    0.757864
e    0.364554
dtype: float64

In [381]: a.index.name='alpha'

In [383]: s
Out[383]:
0     a
1     b
2     c
3     d
4     a
5     b
6     c
7     d
8     d
9     b
10    c
dtype: object

In [384]: a
Out[384]:
alpha
a    0.716350
b    0.922494
c    0.739640
d    0.757864
e    0.364554
dtype: float64
DataFrame索引

(1)行索引

 df = pd.DataFrame(np.random.randn(4,3),columns=['one','two','three'])

In [386]: df
Out[386]:
        one       two     three
0  1.284715 -0.320224  1.636582
1 -0.620074 -0.257786 -0.913668
2 -1.291568  0.933435 -0.755850
3  0.517897  1.550158  0.342784

In [387]: df.index
Out[387]: RangeIndex(start=0, stop=4, step=1)

(2)列索引

 df.columns
Out[388]: Index(['one', 'two', 'three'], dtype='object')

(3)给索引取别名

In [389]: df.index.name='row'

In [390]: df.columns.name='col'

In [391]: df
Out[391]:
col       one       two     three
row
0    1.284715 -0.320224  1.636582
1   -0.620074 -0.257786 -0.913668
2   -1.291568  0.933435 -0.755850
3    0.517897  1.550158  0.342784

(4)查看索引的类别

In [392]: pd.*Index?
pd.CategoricalIndex
pd.DatetimeIndex
pd.Float64Index
pd.Index
pd.Int64Index
pd.IntervalIndex
pd.MultiIndex
pd.PeriodIndex
pd.RangeIndex
pd.TimedeltaIndex
pd.UInt64Index

(5)重复索引:索引值有重复的项,对重复的值返回Series,对没有重复的值返回一个数据

In [395]: s['a']
Out[395]:
a    0
a    5
dtype: int32

In [396]: s['c']
Out[396]: 2

(6)判断是否为重复索引,若为False则为重复索引

In [397]: s.index.is_unique
Out[397]: False

In [398]: a.index.unique()
Out[398]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object', name='alpha')

(7)根据具体要求对重复索引进行清洗:groupby分组处理

In [400]: s.groupby(s.index).sum()
Out[400]:
a    5
b    5
c    2
d    3
dtype: int32

In [401]: s.groupby(s.index).mean()
Out[401]:
a    2.5
b    2.5
c    2.0
d    3.0
dtype: float64

In [402]: s.groupby(s.index).first()
Out[402]:
a    0
b    1
c    2
d    3
dtype: int32
多级索引

把更高维度的数据用二维数据表示,可以增强可读性

Series多级索引
  • Series多级索引的创建
In [404]: a = [['a','a','b','b','b','c','c'],[1,2,3,1,2,3,3]]

In [405]: t=list(zip(*a))

In [406]: t
Out[406]: [('a', 1), ('a', 2), ('b', 3), ('b', 1), ('b', 2), ('c', 3), ('c', 3)]

In [407]: index = pd.MultiIndex.from_tuples(t,names=['level1','level2'])

In [408]: index
Out[408]:
MultiIndex(levels=[['a', 'b', 'c'], [1, 2, 3]],
           codes=[[0, 0, 1, 1, 1, 2, 2], [0, 1, 2, 0, 1, 2, 2]],
           names=['level1', 'level2'])

In [409]: s=pd.Series(np.random.rand(7),index=index)

In [410]: s
Out[410]:
level1  level2
a       1         0.001792
        2         0.147195
b       3         0.340437
        1         0.933020
        2         0.310989
c       3         0.056918
        3         0.430210
dtype: float64
  • Series多级索引的调用
In [411]: s['b']
Out[411]:
level2
3    0.340437
1    0.933020
2    0.310989
dtype: float64

In [412]: s['b':'c']
Out[412]:
level1  level2
b       3         0.340437
        1         0.933020
        2         0.310989
c       3         0.056918
        3         0.430210
dtype: float64

In [413]: s[['a','c']]
Out[413]:
level1  level2
a       1         0.001792
        2         0.147195
c       3         0.056918
        3         0.430210
dtype: float64

In [414]: s[:,2]
Out[414]:
level1
a    0.147195
b    0.310989
dtype: float64
DataFrame多级索引
  • DataFrame多级索引的创建
 df = pd.DataFrame(np.random.randint(1,10,(4,3)),index = [['a','a','b','b'],[1,2,1,2]],columns = [['one','one','two'],[
     ...: 'blue','red','blue']])

In [416]: df.index.names = ['row-1','row-2']

In [417]: df.columns.name = ['col-1','col-2']

In [418]: df
Out[418]:
             one      two
            blue red blue
row-1 row-2
a     1        5   3    9
      2        1   4    1
b     1        8   8    1
      2        2   8    9
  • DataFrame多级索引的调用
 df.loc['a']
Out[419]:
       one      two
      blue red blue
row-2
1        5   3    9
2        1   4    1

In [420]: type(df.loc['a'])
Out[420]: pandas.core.frame.DataFrame

In [421]: df.loc['a',1]
Out[421]:
one  blue    5
     red     3
two  blue    9
Name: (a, 1), dtype: int32

In [422]: df.loc['a',1].index
Out[422]:
MultiIndex(levels=[['one', 'two'], ['blue', 'red']],
           codes=[[0, 0, 1], [0, 1, 0]])
  • 索引的交换
In [423]: df2 = df.swaplevel('row-1','row-2')

In [424]: df2
Out[424]:
             one      two
            blue red blue
row-2 row-1
1     a        5   3    9
2     a        1   4    1
1     b        8   8    1
2     b        2   8    9
多级索引的统计
In [430]: df.sum(level=1)
Out[430]:
       one      two
      blue red blue
row-2
1       13  11   10
2        3  12   10

In [431]: df.sum(level=0)
Out[431]:
       one      two
      blue red blue
row-1
a        6   7   10
b       10  16   10
列数据和索引数据的转换

(1)将列数据转换成索引数据

 df = pd.DataFrame({
   'a':range(7),'b':range(7,0,-1),'c':['one','one','one','two','two','two','two'],'d':[0,1,2,0,1,2,3]}
     ...: )

In [433]: df
Out[433]:
   a  b    c  d
0  0  7  one  0
1  1  6  one  1
2  2  5  one  2
3  3  4  two  0
4  4  3  two  1
5  5  2  two  2
6  6  1  two  3

In [434]: df.set_index('c')
Out[434]:
     a  b  d
c
one  0  7  0
one  1  6  1
one  2  5  2
two  3  4  0
two  4  3  1
two  5  2  2
two  6  1  3

In [435]: df.set_index(['c','d'])
Out[435]:
       a  b
c   d
one 0  0  7
    1  1  6
    2  2  5
two 0  3  4
    1  4  3
    2  5  2
    3  6  1

(2)将索引转化回列数据

In [436]: df2 = df.set_index(['c','d'])

In [437]: df2
Out[437]:
       a  b
c   d
one 0  0  7
    1  1  6
    2  2  5
two 0  3  4
    1  4  3
    2  5  2
    3  6  1

In [438]: df2.reset_index()
Out[438]:
     c  d  a  b
0  one  0  0  7
1  one  1  1  6
2  one  2  2  5
3  two  0  3  4
4  two  1  4  3
5  two  2  5  2
6  two  3  6  1

In [440]: df.reset_index().sort_index('columns')
Out[440]:
   a  b    c  d  index
0  0  7  one  0      0
1  1  6  one  1      1
2  2  5  one  2      2
3  3  4  two  0      3
4  4  3  two  1      4
5  5  2  two  2      5
6  6  1  two  3      6

In [441]: df2.reset_index().sort_index('columns') == df
Out[441]:
      a     b     c     d
0  True  True  True  True
1  True  True  True  True
2  True  True  True  True
3  True  True  True  True
4  True  True  True  True
5  True  True  True  True
6  True  True  True  True

Pandas分组计算

分组计算三步曲:拆分→应用→合并
拆分:根据什么进行分组?
应用:每个分组进行什么样的计算?
合并:把每个分组的计算结果合并
在这里插入图片描述

通过列表进行分组
按行进行分组

(1)常规分组计算

df = pd.DataFrame({
   'key1':['a','a','b','b','a'],'key2':['one','two','one','two','one'],'data1':np.random.randint(1,10,
     ...: 5),'data2':np.random.randint(1,10,5)})

In [443]: df
Out[443]:
  key1 key2  data1  data2
0    a  one      4      6
1    a  two      7      2
2    b  one      1      3
3    b  two      4      1
4    a  one      6      6

In [444]: df['data1'].groupby(df['key1']).mean()
Out[444]:
key1
a    5.666667
b    2.500000
Name: data1, dtype: float64

In [449]: df.groupby('key1').sum()
Out[449]:
      data1  data2
key1
a        17     14
b         5      4
# 把不是数字的‘key2’组直接丢掉

In [450]: df.groupby('key1').sum()['data1']
Out[450]:
key1
a    17
b     5
Name: data1, dtype: int32

In [453]: mean = df.groupby(['key1','key2']).sum()['data1']

In [454]: mean
Out[454]:
key1  key2
a     one     10
      two      7
b     one      1
      two      4
Name: data1, dtype: int32

# 通过unstack转化成DafaFrame
In [455]: mean.unstack()
Out[455]:
key2  one  two
key1
a      10    7
b       1    4

(2)自定义分组键

In [445]: key = [1,2,1,1,2]

In [446]: df['data1'].groupby(key).mean()
Out[446]:
1    3.0
2    6.5
Name: data1, dtype: float64

(3)多层索引分组

In [447]: df['data1'].groupby([df['key1'],df['key2']]).sum()
Out[447]:
key1  key2
a     one     10
      two      7
b     one      1
      two      4
Name: data1, dtype: int32

In [448]: df['data1'].groupby([df['key1'],df['key2']]).size()
Out[448]:
key1  key2
a     one     2
      two     1
b     one     1
      two     1
Name: data1, dtype: int64

(4)使用groupby的迭代器协议

In [457]: for name, group in df.groupby('key1'):
     ...:     print (name)
     ...:     print (group)
     ...:
a
  key1 key2  data1  data2
0    a  one      4      6
1    a  two      7      2<
  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值