使用concat drop_duplicates:
df = pd.concat([df1, df2]).drop_duplicates('user_id').reset_index(drop=True)
print (df)
user_id username firstname lastname
0 123 abc abc abc
1 456 def def def
2 789 ghi ghi ghi
3 111 xyz xyz xyz
4 234 mnp mnp mnp
首先使用groupby和aggregation的解决方案比较慢:
df = pd.concat([df1, df2]).groupby('user_id', as_index=False, sort=False).first()
print (df)
user_id username firstname lastname
0 123 abc abc abc
1 456 def def def
2 789 ghi ghi ghi
3 111 xyz xyz xyz
4 234 mnp mnp mnp
编辑:
df = pd.concat([df1, df2[~np.in1d(df2['user_id'], df1['user_id'])]], ignore_index=True)
print (df)
user_id username firstname lastname
0 123 abc abc abc
1 456 def def def
2 789 ghi ghi ghi
3 111 xyz xyz xyz
4 234 mnp mnp mnp