什么是复制操作警告(SettingWithCopyWarning)?
(这种常用词下文首次出现时用中英文,后面直接用英文)
要弄清楚如何处理这种警告,首先要弄清楚它的含义和出现的原因。
当过滤(filter)数据集(DataFrame)时,对数据集进行切片或者引用操作有可能会返回一个视图(view),也可能返回一个副本(copy),这取决于内在的程序设计或者各种执行细节。View顾名思义,就是对原始数据的观察,因此修改视图也可能会直接改变原数据。另一方面,副本(copy)是对原数据的复制,因此修改副本对于原数据没有影响。
情况一:
df = pd.DataFrame({'A': 'aaa bbb ccc ddd eee aaa bbb ccc'.split(),
'B': 'one one one two two two two two'.split(),
'C': [2,35,5,6,8,56,44,72], 'D': [23,36,55,78,81,65,57,99]})
df
A B C D
0 aaa one 2 23
1 bbb one 35 36
2 ccc one 5 55
3 ddd two 6 78
4 eee two 8 81
5 aaa two 56 65
6 bbb two 44 57
7 ccc two 72 99
df[df['A'] == 'aaa']['B'] = 'three'
<input>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
解决办法,使用loc命令:
df.loc[df['A'] == 'aaa','B'] = 'three'
df
A B C D
0 aaa three 2 23
1 bbb one 35 36
2 ccc one 5 55
3 ddd two 6 78
4 eee two 8 81
5 aaa three 56 65
6 bbb two 44 57
7 ccc two 72 99
情况二:
df1 = df[df['B'].str.contains('w')]
df1.loc[df1['A']=='bbb','C'] = 111
D:\pycharm\lib\site-packages\pandas\core\indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
原因是df1为df ‘B’ 列中包含‘w’字符的dataframe,返回的是为view,只要改为强制返回副本copy就可以:
df1 = df[df['B'].str.contains('w')].copy()
df1.loc[df1['A']=='bbb','C'] = 111
df1
A B C D E
3 ddd two 6 78 156
4 eee two 8 81 162
6 bbb two 111 57 114
7 ccc two 72 99 198