pandas数据结构 DataFrame 方法链assign()，索引/选择

最新推荐文章于 2024-07-05 22:47:20 发布

ycyrym

最新推荐文章于 2024-07-05 22:47:20 发布

阅读量1.8k

点赞数 1

文章标签： python 索引

原文链接：https://www.pypandas.cn/docs/getting_started/dsintro.html#dataframe

版权

https://www.pypandas.cn/docs/getting_started/dsintro.html#dataframe

用方法链分配新列，assign 返回的都是数据副本，原 DataFrame 不变。


dic = {'SepalLength':[5.1,4.9,4.7,4.6,5.0],'SepalWidth':[3.5,3.0,3.2,3.1,3.6],'PetalLength':[1.4,1.4,1.3,1.5,1.4],
       'PetalWidth':[0.2,0.2,0.2,0.2,0.2],'Name':['Iris-setosa'] * 5}
iris = pd.DataFrame(dic)

   SepalLength  SepalWidth  PetalLength  PetalWidth         Name
0          5.1         3.5          1.4         0.2  Iris-setosa
1          4.9         3.0          1.4         0.2  Iris-setosa
2          4.7         3.2          1.3         0.2  Iris-setosa
3          4.6         3.1          1.5         0.2  Iris-setosa
4          5.0         3.6          1.4         0.2  Iris-setosa

受 dplyr 的 mutate 启发，DataFrame 提供了 assign() 方法，可以利用现有的列创建新列。

iris.assign(Sepalratio = (iris['SepalWidth'] / iris['SepalLength']))
   SepalLength  SepalWidth     ...             Name  Sepalratio
0          5.1         3.5     ...      Iris-setosa    0.686275
1          4.9         3.0     ...      Iris-setosa    0.612245
2          4.7         3.2     ...      Iris-setosa    0.680851
3          4.6         3.1     ...      Iris-setosa    0.673913
4          5.0         3.6     ...      Iris-setosa    0.720000

上例中，插入了一个预计算的值。还可以传递带参数的函数，在 assign 的 DataFrame 上求值。

iris.assign(Sepalration2 = lambda x:(x['SepalWidth']/x['SepalLength']))
   SepalLength  SepalWidth      ...              Name  Sepalration2
0          5.1         3.5      ...       Iris-setosa      0.686275
1          4.9         3.0      ...       Iris-setosa      0.612245
2          4.7         3.2      ...       Iris-setosa      0.680851
3          4.6         3.1      ...       Iris-setosa      0.673913
4          5.0         3.6      ...       Iris-setosa      0.720000

从 3.6 版开始，Python 可以保存 **kwargs 顺序。这种操作允许依赖赋值，**kwargs 后的表达式，可以引用同一个 assign() 函数里之前创建的列。

索引 / 选择

选择列: df [col] 返回Series
用标签选择行： df.loc [label] 返回Series
用整数位置选择行： df.iloc [loc] 返回Series，iloc[0]返回的是除了columns以外的第一行数据

iris.loc[2]
irs.iloc[2]     
SepalLength            4.7
SepalWidth             3.2
PetalLength            1.3
PetalWidth             0.2
Name           Iris-setosa
Name: 2, dtype: object