pandas merge,join and concatenate

最新推荐文章于 2024-06-20 16:32:28 发布

wangquannuaa

最新推荐文章于 2024-06-20 16:32:28 发布

阅读量1.2k

点赞数 1

分类专栏： python

python 专栏收录该内容

41 篇文章 0 订阅

订阅专栏

Concatenating objects

In [3]: pieces = [df[:3], df[3:7], df[7:]]
In [4]: concatenated = concat(pieces)
In [5]: concatenated
Out[5]:
  0        1         2         3
0 0.469112 -0.282863 -1.509059 -1.135632
1 1.212112 -0.173215 0.119209 -1.044236
2 -0.861849 -2.104569 -0.494929 1.071804
3 0.721555 -0.706771 -1.039575 0.271860
4 -0.424972 0.567020 0.276232 -1.087401
5 -0.673690 0.113648 -1.478427 0.524988
6 0.404705 0.577046 -1.715002 -1.039268
7 -0.370647 -1.157892 -1.344312 0.844885
8 1.075770 -0.109050 1.643563 -1.469388
9 0.357021 -0.674600 -1.776904 -0.968914
In [6]: concatenated = concat(pieces, keys=[’first’, ’second’, ’third’])
In [7]: concatenated
Out[7]:
      0          1         2         3
first 0 0.469112 -0.282863 -1.509059 -1.135632
      1 1.212112 -0.173215 0.119209 -1.044236
      2 -0.861849 -2.104569 -0.494929 1.071804
second 3 0.721555 -0.706771 -1.039575 0.271860
       4 -0.424972 0.567020 0.276232 -1.087401
       5 -0.673690 0.113648 -1.478427 0.524988
       6 0.404705 0.577046 -1.715002 -1.039268
third 7 -0.370647 -1.157892 -1.344312 0.844885
      8 1.075770 -0.109050 1.643563 -1.469388
      9 0.357021 -0.674600 -1.776904 -0.968914
      In [8]: concatenated.ix[’second’]
Out[8]:
  0        1         2         3
3 0.721555 -0.706771 -1.039575 0.271860
4 -0.424972 0.567020 0.276232 -1.087401
5 -0.673690 0.113648 -1.478427 0.524988
6 0.404705 0.577046 -1.715002 -1.039268

frames = [ process_your_file(f) for f in files ]
result = pd.concat(frames)

In [11]: df
Out[11]:
      a         b        c        d
mPXqv -1.294524 0.413738 0.276662 -0.472035
AH4pW -0.013960 -0.362543 -0.006154 -0.923061
c30Fm 0.895717 0.805244 -1.206412 2.565646
3EWtQ 1.431256 1.340309 -1.170299 -0.226169
1gQh9 0.410835 0.813850 0.132003 -0.827317
KQwv8 -0.076467 -1.187678 1.130127 -1.436737
8UDGh -1.413681 1.607920 1.024180 0.569605
KA8Vn 0.875906 -2.211372 0.974466 -2.006747
KDDLI -0.410001 -0.078638 0.545952 -1.219217
yZsRv -1.226825 0.769804 -1.281247 -0.727707
In [12]: concat([df.ix[:7, [’a’, ’b’]], df.ix[2:-2, [’c’]],
....: df.ix[-7:, [’d’]]], axis=1)
....:
Out[12]:
      a        b        c        d
1gQh9 0.410835 0.813850 0.132003 -0.827317
3EWtQ 1.431256 1.340309 -1.170299 -0.226169
8UDGh -1.413681 1.607920 1.024180 0.569605
AH4pW -0.013960 -0.362543 NaN NaN
KA8Vn NaN      NaN      0.974466 -2.006747
KDDLI NaN      NaN      NaN      -1.219217
KQwv8 -0.076467 -1.187678 1.130127 -1.436737
c30Fm 0.895717 0.805244 -1.206412 NaN
mPXqv -1.294524 0.413738 NaN      NaN
yZsRv NaN      NaN       NaN      -0.727707
In [13]: concat([df.ix[:7, [’a’, ’b’]], df.ix[2:-2, [’c’]],
....: df.ix[-7:, [’d’]]], axis=1, join=’inner’)
....:
Out[13]:
      a        b        c         d
3EWtQ 1.431256 1.340309 -1.170299 -0.226169
1gQh9 0.410835 0.813850 0.132003 -0.827317
KQwv8 -0.076467 -1.187678 1.130127 -1.436737
8UDGh -1.413681 1.607920 1.024180 0.569605
In [14]: concat([df.ix[:7, [’a’, ’b’]], df.ix[2:-2, [’c’]],
....: df.ix[-7:, [’d’]]], axis=1, join_axes=[df.index])
....:
Out[14]:
      a         b        c   d
mPXqv -1.294524 0.413738 NaN NaN
AH4pW -0.013960 -0.362543 NaN NaN
c30Fm 0.895717 0.805244 -1.206412 NaN
3EWtQ 1.431256 1.340309 -1.170299 -0.226169
1gQh9 0.410835 0.813850 0.132003 -0.827317
KQwv8 -0.076467 -1.187678 1.130127 -1.436737
8UDGh -1.413681 1.607920 1.024180 0.569605
KA8Vn NaN NaN 0.974466 -2.006747
KDDLI NaN NaN NaN -1.219217
yZsRv NaN NaN NaN -0.727707

Concatenating using append
A useful shortcut to concat are the append instance methods on Series and DataFrame. These methods actually predated concat. They concatenate along axis=0, namely the index

In [22]: df1
Out[22]:
           A        B        C         D
2000-01-01 0.176444 0.403310 -0.154951 0.301624
2000-01-02 -2.179861 -1.369849 -0.954208 1.462696
2000-01-03 -1.743161 -0.826591 -0.345352 1.314232
In [23]: df2
Out[23]:
           A        B        C
2000-01-04 0.690579 0.995761 2.396780
2000-01-05 3.357427 -0.317441 -1.236269
2000-01-06 -0.487602 -0.082240 -2.182937
In [24]: df1.append(df2)
Out[24]:
           A        B        C         D
2000-01-01 0.176444 0.403310 -0.154951 0.301624
2000-01-02 -2.179861 -1.369849 -0.954208 1.462696
2000-01-03 -1.743161 -0.826591 -0.345352 1.314232
2000-01-04 0.690579 0.995761 2.396780 NaN
2000-01-05 3.357427 -0.317441 -1.236269 NaN
2000-01-06 -0.487602 -0.082240 -2.182937 NaN

Ignoring indexes on the concatenation axis

In [33]: concat([df1, df2], ignore_index=True)
In [34]: df1.append(df2, ignore_index=True)

More concatenating with group keys

In [43]: pieces = [df.ix[:, [0, 1]], df.ix[:, [2]], df.ix[:, [3]]]
In [44]: result = concat(pieces, axis=1, keys=[’one’, ’two’, ’three’])
In [45]: result
Out[45]:
  one                 two      three
  0         1         2        3
0 -0.014805 -0.284319 0.650776 -1.461665
1 -1.137707 -0.891060 -0.693921 1.613616
2 0.464000 0.227371 -0.496922 0.306389
3 -2.290613 -1.134623 -1.561819 -0.260838
4 0.281957 1.523962 -0.902937 0.068159
5 -0.057873 -0.368204 -1.144073 0.861209
6 0.800193 0.782098 -1.069094 -1.099248
7 0.255269 0.009750 0.661084 0.379319
8 -0.008434 1.952541 -1.056652 0.533946
9 -1.226970 0.040403 -0.507516 -0.230096

Database-style DataFrame joining/merging

left Use keys from left frame only
right Use keys from right frame only
outer Use union of keys from both frames
inner Use intersection of keys from both frame