python中对象包含_包含python对象（例如列表）的Deepcopy pandas DataFrame

最新推荐文章于 2023-10-23 11:38:05 发布

weixin_39900286

最新推荐文章于 2023-10-23 11:38:05 发布

阅读量100

点赞数

文章标签： python中对象包含

Need help understanding variable assignment, pointers, ...

The following is reproducible.

import pandas as pd

df = pd.DataFrame({

'listData': [

['c', 'f', 'd', 'a', 'e', 'b'],

[5, 2, 1, 4, 3]

]})

df['listDataSort'] = df['listData']

gives:

listData listDataSort

0 [c, f, d, a, e, b] [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3] [5, 2, 1, 4, 3]

If I only want to sort the lists in the listDataSort column, I might try:

df['listDataSort'].apply(lambda l: l.sort())

However, that sorts the lists in both columns, in-place.

listData listDataSort

0 [a, b, c, d, e, f] [a, b, c, d, e, f]

1 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]

I can fix this by instead doing:

df = pd.DataFrame({

'listData': [

['c', 'f', 'd', 'a', 'e', 'b'],

[5, 2, 1, 4, 3]

]})

df['listDataSort'] = df['listData'].apply(sorted)

giving:

listData listDataSort

0 [c, f, d, a, e, b] [a, b, c, d, e, f]

1 [5, 2, 1, 4, 3] [1, 2, 3, 4, 5]

Assigning df to a different variable, say df2 still changes everything back to the original source list. Furthermore, how do I create a new dataframe based on an existing dataframe so I can make changes to the new dataframe without making the same changes to the existing dataframe?

df = pd.DataFrame({

'listData': [

['c', 'f', 'd', 'a', 'e', 'b'],

[5, 2, 1, 4, 3]

]})

df2 = df

print('\ndf\n', df)

print('\ndf2\n', df2)

df2['listDataSort'] = df2['listData']

print('\ndf\n', df)

print('\ndf2\n', df2)

df2['listDataSort'].apply(lambda l: l.sort())

print('\ndf\n', df)

print('\ndf2\n', df2)

prints:

listData

0 [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3]

df2

listData

0 [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3]

listData listDataSort

0 [c, f, d, a, e, b] [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3] [5, 2, 1, 4, 3]

df2

listData listDataSort

0 [c, f, d, a, e, b] [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3] [5, 2, 1, 4, 3]

listData listDataSort

0 [a, b, c, d, e, f] [a, b, c, d, e, f]

1 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]

df2

listData listDataSort

0 [a, b, c, d, e, f] [a, b, c, d, e, f]

1 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]

also:

df = pd.DataFrame({

'listData': [

['c', 'f', 'd', 'a', 'e', 'b'],

[5, 2, 1, 4, 3]

]})

print('\ndf\n', df)

df3 = df

df3['listDataSort'] = df3['listData'].apply(sorted)

print('\ndf\n', df)

print('\ndf3\n', df3)

prints:

listData

0 [c, f, d, a, e, b]

1 [5, 2, 1, 4, 3]

listData listDataSort

0 [c, f, d, a, e, b] [a, b, c, d, e, f]

1 [5, 2, 1, 4, 3] [1, 2, 3, 4, 5]

df3

listData listDataSort

0 [c, f, d, a, e, b] [a, b, c, d, e, f]

1 [5, 2, 1, 4, 3] [1, 2, 3, 4, 5]

解决方案

When you run

df['listDataSort'] = df['listData']

All you do is copy the references of the lists to new columns. This means only a shallow copy is performed and both columns reference the same lists. So any change to one column will likely affect another.

You can use a list comprehension with sorted which returns a copy of the data. This should be the easiest option for you.

df['listDataSort'] = [sorted(x) for x in df['listDataSort']]

listData listDataSort

0 [c, f, d, a, e, b] [a, b, c, d, e, f]

1 [5, 2, 1, 4, 3] [1, 2, 3, 4, 5]

Now, when it comes to the problem of making a copy of the entire DataFrame, things are a little more complicated. I would recommend deepcopy:

import copy

df2 = df.apply(copy.deepcopy)

weixin_39900286

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python中对象包含_包含python对象（例如列表）的Deepcopy pandas DataFrame

Need help understanding variable assignment, pointers, ...The following is reproducible.import pandas as pddf = pd.DataFrame({'listData': [['c', 'f', 'd', 'a', 'e', 'b'],[5, 2, 1, 4, 3]]})df['listData...
复制链接

扫一扫