今天介绍一个库,叫pandarallel,用于pandas库多进程执行.
安装库 pip install pandarallel
测试平台:
+ 32位树莓派系统 Pi OS
+ python 3.7
+ pandas 1.0.3
大致代码如下:
这是一段简单计算RFM的代码,由原生pandas apply完成
```
today = datetime.datetime.now()
df = pd.read_csv('rfm.csv', parse_dates=['max_trandt'])
bg = time.time()
df['max_trandt'] = df['max_trandt'].apply(lambda dt: (today - dt).days)
# print(df.head(100).to_string())
tran_count_m = df['tran_count'].median()
max_trandt_m = df['max_trandt'].median()
max_tranam_m = df['max_tranam'].median()
df['r'] = df['max_trandt'].apply(lambda row: '0' if row - max_trandt_m >= 0 else '1')
df['f'] = df['tran_count'].apply(lambda row: