【pandas】解决怎么对 pandas.DataFrame UPSERT(update & insert) 另一个 DataFrame 问题

最新推荐文章于 2023-07-06 15:02:07 发布

FeatureOverload

最新推荐文章于 2023-07-06 15:02:07 发布

阅读量1.3k

点赞数 2

分类专栏： # Python Pandas(数据处理) Solution合集

本文链接：https://blog.csdn.net/qq_29757283/article/details/111624130

版权

Python 同时被 3 个专栏收录

70 篇文章 1 订阅

订阅专栏

Solution合集

27 篇文章 0 订阅

订阅专栏

Pandas(数据处理)

2 篇文章 0 订阅

订阅专栏

Overview

问题

你想要对用一个 pandas.DataFrame 数据，对原来的一个 pandas.DataFrame 进行 UPSERT 更新，

I am looking for an elegant way to append all the rows from one DataFrame to another DataFrame (both DataFrames having the same index and column structure), but in cases where the same index value appears in both DataFrames, use the row from the second data frame.

例如你有这两个 DataFrame：

df1:
                    A      B
    date
    '2015-10-01'  'A1'   'B1'
    '2015-10-02'  'A2'   'B2'
    '2015-10-03'  'A3'   'B3'

df2:
    date            A      B
    '2015-10-02'  'a1'   'b1'
    '2015-10-03'  'a2'   'b2'
    '2015-10-04'  'a3'   'b3'

你想要得到如下的 DataFrame 结果：

                    A      B
    date
    '2015-10-01'  'A1'   'B1'
    '2015-10-02'  'a1'   'b1'
    '2015-10-03'  'a2'   'b2'
    '2015-10-04'  'a3'   'b3'

This is analogous to what I think is called “upsert” in some SQL systems — a combination of update and insert, in the sense that each row from df2 is either (a) used to update an existing row in df1 if the row key already exists in df1, or (b) inserted into df1 at the end if the row key does not already exist.

Solution

以下是我在 StackOverflow 上的 “answer” 👇

~~This question is very help to me.~~

两种 Solution

answer of @Alexander that is good enough.

I just want to give a little explain the different of two solution:

much lines

def upsert(target, new)
  df = pd.concat([target, new[~new.index.isin(target.index)]], sort=False)
  df.update(new)
  return df

and

one-liner

def upsert(target, new):
  return pd.concat([target[~target.index.isin(new.index)], new], sort=False)

解释两种 Solution 的不同之处

There is the example can explain the difference:

>>> df = pd.DataFrame([('python', None, 4),
                       ('java',   '1992', 3),
                       ('javascript', None, 2)],
                      columns=['lang', 'year', 'rank'])
>>> df
	lang	year	rank
0	python	None	4
1	java	1992	3
2	javascript	None	2

>>> upsert_df = pd.DataFrame([('python', '1987'),
                              ('GOLANG', '2009')],
                             columns=['lang', 'year'])
>>> upsert_df
	lang	year
0	python	1987
1	GOLANG	2009

>>> target, new = df.set_index(['lang']), upsert_df.set_index(['lang'])
>>> upsert(target, new, full_update=True)
	year	rank
lang		
java	1992	3.0
javascript	None	2.0
python	1987	NaN
GOLANG	2009	NaN

>>> upsert(target, new, full_update=False)
	year	rank
lang		
python	1987	4.0
java	1992	3.0
c	1976	1.0
javascript	None	2.0
GOLANG	2009	NaN

the upsert function like this 👇

完整的 solution

def upsert(target, new, *, full_update=True, sort=False) -> pd.DataFrame:
  """do a upsert in pandas

  :param target: to be upsert;
  :param new:  upsert data to target;
  :param full_update: bool: 在 new 没有完整的 target.columns 的情况下，对于 update 的那部分 rows，
                            是对所有的 target.columns 都 update（没有的 columns 值将为空）？
                            还是对 new 中存在的 columns 对 target 执行该部分的 columns update？
                            默认对所有的 target.columns 都 update（new 中没有的 columns 值将为空）;
  :param sort: bool: if new.columns != target.columns,
                     the new version of pandas execute .concat will sort columns by default.

  Reference: https://stackoverflow.com/a/33002097
  """

  if full_update:
    return pd.concat([target[~target.index.isin(new.index)], new], sort=sort)
  else:
    df = pd.concat([target, new[~new.index.isin(target.index)]], sort=sort)
    df.update(new)
    return df

Reference

🔗 pandas DataFrame concat / update (“upsert”)?

FeatureOverload

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【pandas】解决怎么对 pandas.DataFrame UPSERT(update & insert) 另一个 DataFrame 问题

Overview问题Solution两种 Solution解释两种 Solution 的不同之处完整的 solutionReference问题你想要对用一个 pandas.DataFrame 数据，对原来的一个 pandas.DataFrame 进行 UPSERT 更新，I am looking for an elegant way to append all the rows from one DataFrame to another DataFrame (both DataFrames hav
复制链接

扫一扫

专栏目录