随笔记录——pandas(数据联合)

5、根据索引合并

DataFrame中用于合并的键是它的索引。在这种情况下,你可以传递left_index=True或right_index=True(或者都传)来表示索引需要用来作为合并的键:

df1 = pd.DataFrame({'col1':range(5), 'col2':['a', 'b', 'c', 'd', 'e']})

df1
Out[91]: 
   col1 col2
0     0    a
1     1    b
2     2    c
3     3    d
4     4    e

df2 = pd.DataFrame({'col2_2':['mingming', 'yueyue']})

df2
Out[93]: 
     col2_2
0  mingming
1    yueyue

pd.merge(df1, df2, left_on='col1', right_index=True)
Out[94]: 
   col1 col2    col2_2
0     0    a  mingming
1     1    b    yueyue
5.1、多层索引进行合并
df1 = pd.DataFrame({'key1':['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'key2':['2000', '2001' , '2002', '2001', '2002'], 'data': np.arange(5)})

df1
Out[98]: 
     key1  key2  data
0    Ohio  2000     0
1    Ohio  2001     1
2    Ohio  2002     2
3  Nevada  2001     3
4  Nevada  2002     4

df2 = pd.DataFrame(np.arange(12).reshape((6, 2)), index=[['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Ohio'], ['2000', '2001' , '2002', '2001', '2002', '2003']], columns=['col1', 'col2'])

df2
Out[104]: 
             col1  col2
Ohio   2000     0     1
       2001     2     3
       2002     4     5
Nevada 2001     6     7
       2002     8     9
Ohio   2003    10    11

pd.merge(df1, df2, left_on=['key1', 'key2'], right_index=True)
Out[106]: 
     key1  key2  data  col1  col2
0    Ohio  2000     0     0     1
1    Ohio  2001     1     2     3
2    Ohio  2002     2     4     5
3  Nevada  2001     3     6     7
4  Nevada  2002     4     8     9

pd.merge(df1, df2, left_on=['key1', 'key2'], right_index=True, how='outer')
Out[107]: 
     key1  key2  data  col1  col2
0    Ohio  2000   0.0     0     1
1    Ohio  2001   1.0     2     3
2    Ohio  2002   2.0     4     5
3  Nevada  2001   3.0     6     7
4  Nevada  2002   4.0     8     9
4    Ohio  2003   NaN    10    11
5.2、两边都是用索引进行合并
df1 = pd.DataFrame(np.arange(6).reshape((3,2)), index=['a', 'b', 'c'], columns=['ohio', 'nevada'])

df1
Out[110]: 
   ohio  nevada
a     0       1
b     2       3
c     4       5

df2 = pd.DataFrame(np.arange(8).reshape((4,2)), index=['d', 'b', 'c','e'], columns=['missouri', 'alabama'])

df2
Out[112]: 
   missouri  alabama
d         0        1
b         2        3
c         4        5
e         6        7

pd.merge(df1, df2, left_index=True, right_index=True)
Out[113]: 
   ohio  nevada  missouri  alabama
b     2       3         2        3
c     4       5         4        5
6、延轴向连接

另一种数据组合操作可互换地称为拼接、绑定或堆叠。NumPy的concatenate函数可以在NumPy数组上实现该功能:

arr = np.arange(16).reshape((4, 4))

np.concatenate([arr, arr], axis=1)
Out[118]: 
array([[ 0,  1,  2,  3,  0,  1,  2,  3],
       [ 4,  5,  6,  7,  4,  5,  6,  7],
       [ 8,  9, 10, 11,  8,  9, 10, 11],
       [12, 13, 14, 15, 12, 13, 14, 15]])
6.1、在axis=0轴上连接 Series
s1 = pd.Series([0, 1], index=['a', 'b'])
s2 = pd.Series([2, 3, 4], index=['c', 'd', 'e'])
s3 = pd.Series([5, 6], index=['f', 'g'])
pd.concat([s1, s2, s3])
Out[127]: 
a    0
b    1
c    2
d    3
e    4
f    5
g    6
dtype: int64
6.2、在axis=1轴上连接 Series
pd.concat([s1,s2,s3], axis=1)
Out[128]: 
     0    1    2
a  0.0  NaN  NaN
b  1.0  NaN  NaN
c  NaN  2.0  NaN
d  NaN  3.0  NaN
e  NaN  4.0  NaN
f  NaN  NaN  5.0
g  NaN  NaN  6.0

你可以看到排序后的索引合集(‘outer’ join外连接)。你也可以传入join=‘inner’:

s4 = pd.concat([s1, s3])

s4
Out[130]: 
a    0
b    1
f    5
g    6
dtype: int64

pd.concat([s1, s4], axis=1)
Out[131]: 
     0  1
a  0.0  0
b  1.0  1
f  NaN  5
g  NaN  6
# join='inner’的选项,'f’和’g’标签消失了。
pd.concat([s1, s4], axis=1, join='inner')
Out[132]: 
   0  1
a  0  0
b  1  1
6.3、 使用keys关键字,在连接结果的Series上设置多层索引
df = pd.concat([s1, s4], keys=['updown', 'down'])

type(df)
Out[147]: pandas.core.series.Series

df.index
Out[146]: 
MultiIndex([('updown', 'a'),
            ('updown', 'b'),
            (  'down', 'a'),
            (  'down', 'b'),
            (  'down', 'f'),
            (  'down', 'g')],
           )
df
Out[148]: 
updown  a    0
        b    1
down    a    0
        b    1
        f    5
        g    6
        
df.unstack()
Out[149]: 
          a    b    f    g
updown  0.0  1.0  NaN  NaN
down    0.0  1.0  5.0  6.0
6.4、 连接Dataframe
df1 = pd.DataFrame(np.arange(8).reshape((4, 2)), columns=['up', 'down'], index=['a', 'b', 'c', 'd'])

df2 = pd.DataFrame(np.arange(6).reshape((3, 2)), columns=['up', 'down'], index=['e', 'f', 'g'])

df1
Out[159]: 
   up  down
a   0     1
b   2     3
c   4     5
d   6     7

df2
Out[160]: 
   up  down
e   0     1
f   2     3
g   4     5

pd.concat([df1, df2])
Out[158]: 
   up  down
a   0     1
b   2     3
c   4     5
d   6     7
e   0     1
f   2     3
g   4     5

pd.concat([df1, df2], axis=1)
Out[161]: 
    up  down   up  down
a  0.0   1.0  NaN   NaN
b  2.0   3.0  NaN   NaN
c  4.0   5.0  NaN   NaN
d  6.0   7.0  NaN   NaN
e  NaN   NaN  0.0   1.0
f  NaN   NaN  2.0   3.0
g  NaN   NaN  4.0   5.0

pd.concat([df1, df2], axis=1, keys=['level1', 'level2'])
Out[162]: 
  level1      level2     
      up down     up down
a    0.0  1.0    NaN  NaN
b    2.0  3.0    NaN  NaN
c    4.0  5.0    NaN  NaN
d    6.0  7.0    NaN  NaN
e    NaN  NaN    0.0  1.0
f    NaN  NaN    2.0  3.0
g    NaN  NaN    4.0  5.0

使用ignore_index=True ,可以忽略索引,直接连接(注意,下面的索引被重建了)

df1 = pd.DataFrame(np.random.randn(3, 4), columns=['a', 'b', 'c', 'd'])

df2 = pd.DataFrame(np.random.randn(2, 3), columns=['b', 'd', 'a'])

df1
Out[166]: 
          a         b         c         d
0 -0.544383  0.110923 -1.150994  0.375698
1 -0.600639 -0.291694 -0.601707  1.852278
2 -0.013497 -1.057711  0.822545 -1.220844

df2
Out[167]: 
          b         d         a
0  0.208864 -1.959670 -1.328186
1  0.196861  0.738467  0.171368

pd.concat([df1, df2])
Out[168]: 
          a         b         c         d
0 -0.544383  0.110923 -1.150994  0.375698
1 -0.600639 -0.291694 -0.601707  1.852278
2 -0.013497 -1.057711  0.822545 -1.220844
0 -1.328186  0.208864       NaN -1.959670
1  0.171368  0.196861       NaN  0.738467

pd.concat([df1, df2], ignore_index=True)
Out[169]: 
          a         b         c         d
0 -0.544383  0.110923 -1.150994  0.375698
1 -0.600639 -0.291694 -0.601707  1.852278
2 -0.013497 -1.057711  0.822545 -1.220844
3 -1.328186  0.208864       NaN -1.959670
4  0.171368  0.196861       NaN  0.738467
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值