python中对象包含_包含python对象(例如列表)的Deepcopy pandas DataFrame

Need help understanding variable assignment, pointers, ...

The following is reproducible.

import pandas as pd

df = pd.DataFrame({

'listData': [

['c', 'f', 'd', 'a', 'e', 'b'],

[5, 2, 1, 4, 3]

]})

df['listDataSort'] = df['listData']

gives:

listData listDataSort

0 [c, f, d, a, e, b] [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3] [5, 2, 1, 4, 3]

If I only want to sort the lists in the listDataSort column, I might try:

df['listDataSort'].apply(lambda l: l.sort())

df

However, that sorts the lists in both columns, in-place.

listData listDataSort

0 [a, b, c, d, e, f] [a, b, c, d, e, f]

1 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]

I can fix this by instead doing:

df = pd.DataFrame({

'listData': [

['c', 'f', 'd', 'a', 'e', 'b'],

[5, 2, 1, 4, 3]

]})

df['listDataSort'] = df['listData'].apply(sorted)

giving:

listData listDataSort

0 [c, f, d, a, e, b] [a, b, c, d, e, f]

1 [5, 2, 1, 4, 3] [1, 2, 3, 4, 5]

Assigning df to a different variable, say df2 still changes everything back to the original source list. Furthermore, how do I create a new dataframe based on an existing dataframe so I can make changes to the new dataframe without making the same changes to the existing dataframe?

df = pd.DataFrame({

'listData': [

['c', 'f', 'd', 'a', 'e', 'b'],

[5, 2, 1, 4, 3]

]})

df2 = df

print('\ndf\n', df)

print('\ndf2\n', df2)

df2['listDataSort'] = df2['listData']

print('\ndf\n', df)

print('\ndf2\n', df2)

df2['listDataSort'].apply(lambda l: l.sort())

print('\ndf\n', df)

print('\ndf2\n', df2)

prints:

df

listData

0 [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3]

df2

listData

0 [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3]

df

listData listDataSort

0 [c, f, d, a, e, b] [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3] [5, 2, 1, 4, 3]

df2

listData listDataSort

0 [c, f, d, a, e, b] [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3] [5, 2, 1, 4, 3]

df

listData listDataSort

0 [a, b, c, d, e, f] [a, b, c, d, e, f]

1 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]

df2

listData listDataSort

0 [a, b, c, d, e, f] [a, b, c, d, e, f]

1 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]

also:

df = pd.DataFrame({

'listData': [

['c', 'f', 'd', 'a', 'e', 'b'],

[5, 2, 1, 4, 3]

]})

print('\ndf\n', df)

df3 = df

df3['listDataSort'] = df3['listData'].apply(sorted)

print('\ndf\n', df)

print('\ndf3\n', df3)

prints:

df

listData

0 [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3]

df

listData listDataSort

0 [c, f, d, a, e, b] [a, b, c, d, e, f]

1 [5, 2, 1, 4, 3] [1, 2, 3, 4, 5]

df3

listData listDataSort

0 [c, f, d, a, e, b] [a, b, c, d, e, f]

1 [5, 2, 1, 4, 3] [1, 2, 3, 4, 5]

解决方案

When you run

df['listDataSort'] = df['listData']

All you do is copy the references of the lists to new columns. This means only a shallow copy is performed and both columns reference the same lists. So any change to one column will likely affect another.

You can use a list comprehension with sorted which returns a copy of the data. This should be the easiest option for you.

df['listDataSort'] = [sorted(x) for x in df['listDataSort']]

df

listData listDataSort

0 [c, f, d, a, e, b] [a, b, c, d, e, f]

1 [5, 2, 1, 4, 3] [1, 2, 3, 4, 5]

Now, when it comes to the problem of making a copy of the entire DataFrame, things are a little more complicated. I would recommend deepcopy:

import copy

df2 = df.apply(copy.deepcopy)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值