在某次使用pd.Dataframe.combine_first()时,出现了重复行,经排查原因为第2个df存在重复行,记录一下。
DataFrame.combine_first(other)
- Update null elements with value in the same location in other.
- Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two.
- Parameters
other : DataFrame
Provided DataFrame to use to fill null values. - Returns
DataFrame
combine_first()方法用于将两个DF组合为一个。结果是两个DF的并集,在调用者DF为Null的情况下,将采用传递的DF中的值。如果两个空值在同一索引处,则在该索引处返回空值
df1=pd.DataFrame(np.arange(1,10).reshape(3,3),columns=list('abc'),index=['1#','2#','3#'])
df2=pd.DataFrame(np.arange(10,22).reshape(4,3),columns=list('bcd'),index=['2#','3#','3#','4$'])
print(df1)
print(df2)
print(df1.combine_first(df2,))
由于df2有重复行,所以输出为:
a b c
1# 1 2 3
2# 4 5 6
3# 7 8 9
b c d
2# 10 11 12
3# 13 14 15
3# 16 17 18
4$ 19 20 21
a b c d
1# 1.0 2.0 3.0 NaN
2# 4.0 5.0 6.0 12.0
3# 7.0 8.0 9.0 15.0
3# 7.0 8.0 9.0 18.0
4$ NaN 19.0 20.0 21.0
参考