迭代地附加到pandas数据帧并不是最好的解决方案.最好将数据构建为列表,然后将其传递给pd.DataFrame.
import random
import pandas as pd
alpha = list('abcdefghijklmnopqrstuvwxyz')
这里我们创建一个生成器,用它来构造一个列表,然后将它传递给dataframe构造函数:
%%timeit
gen = ((random.choice(alpha), random.randint(0,100)) for x in range(10000))
my_data = [x for x in gen]
df = pd.DataFrame(my_data, columns=['letter','value'])
# result: 1 loop, best of 3: 373 ms per loop
这比创建生成器,构造空数据帧和追加行要快得多,如下所示:
%%timeit
gen = ((random.choice(alpha), random.randint(0,100)) for x in range(10000))
df = pd.DataFrame(columns=['letter','value'])
for tup in gen:
df.loc[df.shape[0],:] = tup
# result: 1 loop, best of 3: 13.6 s per loop
这在构造10000行的13秒内非常慢.