pandas merge,join and concatenate

Concatenating objects

In [3]: pieces = [df[:3], df[3:7], df[7:]]
In [4]: concatenated = concat(pieces)
In [5]: concatenated
Out[5]:
  0        1         2         3
0 0.469112 -0.282863 -1.509059 -1.135632
1 1.212112 -0.173215 0.119209 -1.044236
2 -0.861849 -2.104569 -0.494929 1.071804
3 0.721555 -0.706771 -1.039575 0.271860
4 -0.424972 0.567020 0.276232 -1.087401
5 -0.673690 0.113648 -1.478427 0.524988
6 0.404705 0.577046 -1.715002 -1.039268
7 -0.370647 -1.157892 -1.344312 0.844885
8 1.075770 -0.109050 1.643563 -1.469388
9 0.357021 -0.674600 -1.776904 -0.968914
In [6]: concatenated = concat(pieces, keys=[’first’, ’second’, ’third’])
In [7]: concatenated
Out[7]:
      0          1         2         3
first 0 0.469112 -0.282863 -1.509059 -1.135632
      1 1.212112 -0.173215 0.119209 -1.044236
      2 -0.861849 -2.104569 -0.494929 1.071804
second 3 0.721555 -0.706771 -1.039575 0.271860
       4 -0.424972 0.567020 0.276232 -1.087401
       5 -0.673690 0.113648 -1.478427 0.524988
       6 0.404705 0.577046 -1.715002 -1.039268
third 7 -0.370647 -1.157892 -1.344312 0.844885
      8 1.075770 -0.109050 1.643563 -1.469388
      9 0.357021 -0.674600 -1.776904 -0.968914
      In [8]: concatenated.ix[’second’]
Out[8]:
  0        1         2         3
3 0.721555 -0.706771 -1.039575 0.271860
4 -0.424972 0.567020 0.276232 -1.087401
5 -0.673690 0.113648 -1.478427 0.524988
6 0.404705 0.577046 -1.715002 -1.039268
frames = [ process_your_file(f) for f in files ]
result = pd.concat(frames)
In [11]: df
Out[11]:
      a         b        c        d
mPXqv -1.294524 0.413738 0.276662 -0.472035
AH4pW -0.013960 -0.362543 -0.006154 -0.923061
c30Fm 0.895717 0.805244 -1.206412 2.565646
3EWtQ 1.431256 1.340309 -1.170299 -0.226169
1gQh9 0.410835 0.813850 0.132003 -0.827317
KQwv8 -0.076467 -1.187678 1.130127 -1.436737
8UDGh -1.413681 1.607920 1.024180 0.569605
KA8Vn 0.875906 -2.211372 0.974466 -2.006747
KDDLI -0.410001 -0.078638 0.545952 -1.219217
yZsRv -1.226825 0.769804 -1.281247 -0.727707
In [12]: concat([df.ix[:7, [’a’, ’b’]], df.ix[2:-2, [’c’]],
....: df.ix[-7:, [’d’]]], axis=1)
....:
Out[12]:
      a        b        c        d
1gQh9 0.410835 0.813850 0.132003 -0.827317
3EWtQ 1.431256 1.340309 -1.170299 -0.226169
8UDGh -1.413681 1.607920 1.024180 0.569605
AH4pW -0.013960 -0.362543 NaN NaN
KA8Vn NaN      NaN      0.974466 -2.006747
KDDLI NaN      NaN      NaN      -1.219217
KQwv8 -0.076467 -1.187678 1.130127 -1.436737
c30Fm 0.895717 0.805244 -1.206412 NaN
mPXqv -1.294524 0.413738 NaN      NaN
yZsRv NaN      NaN       NaN      -0.727707
In [13]: concat([df.ix[:7, [’a’, ’b’]], df.ix[2:-2, [’c’]],
....: df.ix[-7:, [’d’]]], axis=1, join=’inner’)
....:
Out[13]:
      a        b        c         d
3EWtQ 1.431256 1.340309 -1.170299 -0.226169
1gQh9 0.410835 0.813850 0.132003 -0.827317
KQwv8 -0.076467 -1.187678 1.130127 -1.436737
8UDGh -1.413681 1.607920 1.024180 0.569605
In [14]: concat([df.ix[:7, [’a’, ’b’]], df.ix[2:-2, [’c’]],
....: df.ix[-7:, [’d’]]], axis=1, join_axes=[df.index])
....:
Out[14]:
      a         b        c   d
mPXqv -1.294524 0.413738 NaN NaN
AH4pW -0.013960 -0.362543 NaN NaN
c30Fm 0.895717 0.805244 -1.206412 NaN
3EWtQ 1.431256 1.340309 -1.170299 -0.226169
1gQh9 0.410835 0.813850 0.132003 -0.827317
KQwv8 -0.076467 -1.187678 1.130127 -1.436737
8UDGh -1.413681 1.607920 1.024180 0.569605
KA8Vn NaN NaN 0.974466 -2.006747
KDDLI NaN NaN NaN -1.219217
yZsRv NaN NaN NaN -0.727707

Concatenating using append
A useful shortcut to concat are the append instance methods on Series and DataFrame. These methods actually predated concat. They concatenate along axis=0, namely the index

In [22]: df1
Out[22]:
           A        B        C         D
2000-01-01 0.176444 0.403310 -0.154951 0.301624
2000-01-02 -2.179861 -1.369849 -0.954208 1.462696
2000-01-03 -1.743161 -0.826591 -0.345352 1.314232
In [23]: df2
Out[23]:
           A        B        C
2000-01-04 0.690579 0.995761 2.396780
2000-01-05 3.357427 -0.317441 -1.236269
2000-01-06 -0.487602 -0.082240 -2.182937
In [24]: df1.append(df2)
Out[24]:
           A        B        C         D
2000-01-01 0.176444 0.403310 -0.154951 0.301624
2000-01-02 -2.179861 -1.369849 -0.954208 1.462696
2000-01-03 -1.743161 -0.826591 -0.345352 1.314232
2000-01-04 0.690579 0.995761 2.396780 NaN
2000-01-05 3.357427 -0.317441 -1.236269 NaN
2000-01-06 -0.487602 -0.082240 -2.182937 NaN

Ignoring indexes on the concatenation axis

In [33]: concat([df1, df2], ignore_index=True)
In [34]: df1.append(df2, ignore_index=True)

More concatenating with group keys

In [43]: pieces = [df.ix[:, [0, 1]], df.ix[:, [2]], df.ix[:, [3]]]
In [44]: result = concat(pieces, axis=1, keys=[’one’, ’two’, ’three’])
In [45]: result
Out[45]:
  one                 two      three
  0         1         2        3
0 -0.014805 -0.284319 0.650776 -1.461665
1 -1.137707 -0.891060 -0.693921 1.613616
2 0.464000 0.227371 -0.496922 0.306389
3 -2.290613 -1.134623 -1.561819 -0.260838
4 0.281957 1.523962 -0.902937 0.068159
5 -0.057873 -0.368204 -1.144073 0.861209
6 0.800193 0.782098 -1.069094 -1.099248
7 0.255269 0.009750 0.661084 0.379319
8 -0.008434 1.952541 -1.056652 0.533946
9 -1.226970 0.040403 -0.507516 -0.230096

Database-style DataFrame joining/merging

left Use keys from left frame only
right Use keys from right frame only
outer Use union of keys from both frames
inner Use intersection of keys from both frame

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值