DataFrame.assign(**kwargs)[source]
Assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones. Existing columns that are re-assigned will be overwritten.
assign添加新的列或者覆盖原有的列.关键字参数为列名,如果值是可调用,它们在数据框上计算并被赋给一个新列
>>> df = pd.DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
df
Out[122]:
A B
0 1 -0.577643
1 2 -1.061502
2 3 -0.050118
3 4 0.560739
4 5 -0.888615
5 6 2.280487
6 7 0.181502
7 8 -0.601645
8 9 -2.208362
9 10 -0.596109
>>> df.assign(ln_A = lambda x: np.log(x.A))
A B ln_A
0 1 0.426905 0.000000
1 2 -0.780949 0.693147
2 3 -0.418711 1.098612
3 4 -0.269708 1.386294
4 5 -0.274002 1.609438
5 6 -0.500792 1.791759
6 7 1.649697 1.945910
7 8 -1.495604 2.079442
8 9 0.549296 2.197225
9 10 -0.758542 2.302585
如果值已经存在,则直接插入
>>> newcol = np.log(df['A'])
>>> df.assign(ln_A=newcol)
A B ln_A
0 1 0.426905 0.000000
1 2 -0.780949 0.693147
2 3 -0.418711 1.098612
3 4 -0.269708 1.386294
4 5 -0.274002 1.609438
5 6 -0.500792 1.791759
6 7 1.649697 1.945910
7 8 -1.495604 2.079442
8 9 0.549296 2.197225
9 10 -0.758542 2.302585
根据已存在的列建立新列,C列为A,B列之和
>>> df = pd.DataFrame({'A': [1, 2, 3]})
>>>> df.assign(B=df.A, C=lambda x:x['A']+ x['B'])
A B C
0 1 1 2
1 2 2 4
2 3 3 6