从 Pandas 到 Polars 二十一：用Polars打破你的坏习惯

sosogod

已于 2024-07-16 15:50:58 修改

阅读量213

点赞数 3

分类专栏：极速数据处理：Polars揭秘文章标签： pandas

于 2024-07-16 15:42:48 首次发布

本文链接：https://blog.csdn.net/sosogod/article/details/140467394

版权

极速数据处理：Polars揭秘专栏收录该内容

47 篇文章 6 订阅

订阅专栏

关于Polars的讨论中，我们收到的一个评论是Polars的语法鼓励人们改掉在Pandas中养成的坏习惯。

以.apply（或.applymap）函数为例。我看到很多人在Kaggle数据科学竞赛中使用这个函数，尽管这并不是个好主意。

在下面这个例子中，我们想要将所有列中的正值映射为1，负值映射为-1。

在Polars中使用标准的pl.when方法比Pandas中的apply方法快100倍

在这两个库中，这个示例问题都有进一步优化的空间！

import polars as pl
import numpy as np

# Create a random DataFrame
N = 100_000
dfNumeric = pl.DataFrame(np.random.standard_normal((N,100)))
dfp = dfNumeric.to_pandas()

# Set values to 1 when they are positive and 0 otherwise
(
    dfp
    .applymap(lambda x: 1 if x > 0 else 0)
)
# Time: 2.5 seconds
(
    dfNumeric
    .with_columns(
        [
            pl.when(pl.col(col) > 0).then(1).otherwise(0).alias(col) for col in df.columns
            ]
            )
)
# Time: 30 milliseconds

在Polars中，表达式上下文中的pl .when().then().otherwise() 链式调用是条件表达式的一部分，用于在数据帧(DataFrame)的列上执行条件逻辑。其返回一个新的表达式或列，该表达式或列可以包含在更大的数据帧操作中。在实际应用中，你可以使用多个 .when().then() 链来创建更复杂的条件逻辑。

链式的 when-then 操作类似于Python中的if, elif, ... elif代码块。

由于：