Intro
需求很明确,有一个list,每个元素都是一个dataframe,其中dataframe的列数相同。希望把这些子数据框合并成大的数据框。这个list是多线程计算返回的结果,在R里可以直接用do.call函数,那么python中怎么用呢?先看版本信息:
系统:in10
Python:3.7.0(python --version)
Pandas:0.23.4
数据构造
import pandas as pd
# sample dataframes
d1 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
d2 = pd.DataFrame({'one' : [5., 6., 7., 8.], 'two' : [9., 10., 11., 12.]})
d3 = pd.DataFrame({'one' : [15., 16., 17., 18.], 'three' : [19., 10., 11., 12.]})
# list of dataframes
mydfs = [d1, d2, d3]
mydfs[0]
one two
0 1.0 4.0
1 2.0 3.0
2 3.0 2.0
3 4.0 1.0
concat函数
这个函数其实很常用,只是不知道可以这样用。
pd.concat(mydfs)
D:\code\anaconda\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
"""Entry point for launching an IPython kernel.
one | three | two | |
---|---|---|---|
0 | 1.0 | NaN | 4.0 |
1 | 2.0 | NaN | 3.0 |
2 | 3.0 | NaN | 2.0 |
3 | 4.0 | NaN | 1.0 |
0 | 5.0 | NaN | 9.0 |
1 | 6.0 | NaN | 10. |
2 | 7.0 | NaN | 11. |
3 | 8.0 | NaN | 12. |
0 | 15.0 | 19.0 | NaN |
1 | 16.0 | 10.0 | NaN |
2 | 17.0 | 11.0 | NaN |
3 | 18.0 | 12.0 | NaN |
可以看到列名需要一致,不然会根据列名,做容错处理~
reduce函数
from functools import reduce
reduce(lambda df1, df2: df1.merge(df2, "outer"), mydfs)
one | two | three | |
---|---|---|---|
0 | 1.0 | 4.0 | NaN |
1 | 2.0 | 3.0 | NaN |
2 | 3.0 | 2.0 | NaN |
3 | 4.0 | 1.0 | NaN |
4 | 5.0 | 9.0 | NaN |
5 | 6.0 | 10.0 | NaN |
6 | 7.0 | 11.0 | NaN |
7 | 8.0 | 12.0 | NaN |
8 | 15.0 | NaN | 19.0 |
9 | 16.0 | NaN | 10.0 |
10 | 17.0 | NaN | 11.0 |
11 | 18.0 | NaN | 12.0 |
这个reduce函数和scala里的reduce差不多哎~看来不同语言,在某些功能的实现上是共通的
Ref
[1] stackoverflow
2020-05-07 于南京市江宁区九龙湖
转载自:
https://blog.csdn.net/wendaomudong_l2d4/article/details/106191773