一些比较和时间测试;但请记住,这只是一个小例子。在In [106]: test_arr = np.array([['the', 'quick', 'brown', 'fox'], ['lorem', 'ipsum'], ['this', 'is', 'a', 'test']])
...:
In [107]: def my_func(l):
...: # accepts list, returns another list
...: # dumbed down list transformation function
...: # for testing, just return the first 2 elems of original list
...: return l[0:2]
...:
list comprehension方法返回一个二维字符串数组-因为函数每次都返回2个元素列表。在
^{pr2}$
输入数组是object dtype,因为子列表的长度不同:In [109]: test_arr
Out[109]:
array([list(['the', 'quick', 'brown', 'fox']), list(['lorem', 'ipsum']),
list(['this', 'is', 'a', 'test'])], dtype=object)
frompyfunc返回一个object dtype数组;与我过去的测试一致,它稍微快一些(2倍,但从来没有一个数量级)In [110]: np.frompyfunc(my_func,1,1)(test_arr)
Out[110]:
array([list(['the', 'quick']), list(['lorem', 'ipsum']),
list(['this', 'is'])], dtype=object)
In [111]: timeit np.frompyfunc(my_func,1,1)(test_arr)
5.68 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [112]: timeit np.array([my_func(x) for x in test_arr])
8.96 µs ± 25.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
vectorize使用frompyfunc,但开销更大。需要使用otypes来避免sequence错误(否则它将尝试从试算中推断返回类型):In [113]: np.vectorize(my_func,otypes=[object])(test_arr)
Out[113]:
array([list(['the', 'quick']), list(['lorem', 'ipsum']),
list(['this', 'is'])], dtype=object)
In [114]: timeit np.vectorize(my_func,otypes=[object])(test_arr)
30.4 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)