我试图Concat的DaskDataFrame从read_parquet ,然后应用查询过滤器,然后品尝它封顶最终数据帧大小小于或等于10000下面是伪代码:
import dask.dataframe as dd
df = dd.concat([ dd.read_parquet(path, index='date').query("(col0 < 4) & (date < '20170201')")
for path in files ],
interleave_partitions=True)
df = df.sample(float(10000) / max(10000, len(df)))
df = df.compute()
但是,它失败了:
ValueError: a must be greater than 0
Traceback
---------
File "/opt/anaconda2/lib/python2.7/site-packages/dask/async.py", line 266, in execute_task
result = _execute_task(task, data)
File "/opt/anaconda2/lib/python2.7/site-packages/dask/async.py", line 247, in _execute_task
return func(*args2)
File "/opt/anaconda2/lib/python2.7/site-packages/dask/dataframe/methods.py", line 143, i