当您尝试优化时,您需要做的就是分析和测量! Python提供了timeit模块,使测量变得简单!
这将假设您已事先将字段转换为浮点列表(在任何这些函数之外),因为字符串→浮点转换非常慢.您可以通过字段= [float(f)for string_fields中的f]来执行此操作.
另外,为了进行数值处理,纯python不是很好,因为它最终会为每个操作做很多类型检查(以及其他一些东西).使用像numpy这样的C库将会带来巨大的改进.
find_best
我已将其他人(以及其他一些)的答案合并到一个分析套件中(比如,test_find_best.py):
import random, operator, numpy as np, itertools, timeit
fields = [random.random() for _ in range(3000)]
fields_string = [str(field) for field in fields]
weights = [[random.random() for _ in range(3000)] for c in range(100)]
npw = np.array(weights)
npf = np.array(fields)
num_fields = len(fields)
num_category = len(weights)
def f_original():
winner = -1
best = -float('inf')
for c in range(num_category):
score = 0
for i in range(num_fields):
score += float(fields_string[i]) * weights[c][i]
if score > best:
best = score
winner = c
def f_original_no_string():
winner = -1
best = -float('inf')
for c in range(num_category):
score = 0
for i in range(num_fields):
score += fields[i] * weights[c][i]
if score > best:
best = score
winner = c
def f_original_xrange():
winner = -1
best = -float('inf')
for c in xrange(num_category):
score = 0
for i in xrange(num_fields):
score += fields[i] * weights[c][i]
if score > best:
best = score
winner = c
# Zenon https://stackoverflow.com/a/10134298/1256624
def f_index_comprehension():
winner = -1
best = -float('inf')
for c in range(num_category):
score = sum(fields[i] * weights[c][i] for i in xrange(num_fields))
if score > best:
best = score
winner = c
# steveha https://stackoverflow.com/a/10134247/1256624
def f_comprehension():
winner = -1
best = -float('inf')
for c in xrange(num_category):
score = sum(f * w for f, w in itertools.izip(fields, weights[c]))
if score > best:
best = score
winner = c
def f_schwartz_original(): # https://en.wikipedia.org/wiki/Schwartzian_transform
tup = max(((i, sum(t[0] * t[1] for t in itertools.izip(fields, wlist))) for i, wlist in enumerate(weights)),
key=lambda t: t[1]
)
def f_schwartz_opt(): # https://en.wikipedia.org/wiki/Schwartzian_transform
tup = max(((i, sum(f * w for f,w in itertools.izip(fields, wlist))) for i, wlist in enumerate(weights)),
key=operator.itemgetter(1)
)
def fweight(field_float_list, wlist):
f = iter(field_float_list)
return sum(f.next() * w for w in wlist)
def f_schwartz_iterate():
tup = max(
((i, fweight(fields, wlist)) for i, wlist in enumerate(weights)),
key=lambda t: t[1]
)
# Nolen Royalty https://stackoverflow.com/a/10134147/1256624
def f_numpy_mult_sum():
np.argmax(np.sum(npf * npw, axis = 1))
# me
def f_imap():
winner = -1
best = -float('inf')
for c in xrange(num_category):
score = sum(itertools.imap(operator.mul, fields, weights[c]))
if score > best:
best = score
winner = c
def f_numpy():
np.argmax(npw.dot(npf))
for f in [f_original,
f_index_comprehension,
f_schwartz_iterate,
f_original_no_string,
f_schwartz_original,
f_original_xrange,
f_schwartz_opt,
f_comprehension,
f_imap]:
print "%s: %.2f ms" % (f.__name__, timeit.timeit(f,number=10)/10 * 1000)
for f in [f_numpy_mult_sum, f_numpy]:
print "%s: %.2f ms" % (f.__name__, timeit.timeit(f,number=100)/100 * 1000)
运行python test_find_best.py给了我:
f_original: 310.34 ms
f_index_comprehension: 102.58 ms
f_schwartz_iterate: 103.39 ms
f_original_no_string: 96.36 ms
f_schwartz_original: 90.52 ms
f_original_xrange: 89.31 ms
f_schwartz_opt: 69.48 ms
f_comprehension: 68.87 ms
f_imap: 53.33 ms
f_numpy_mult_sum: 3.57 ms
f_numpy: 0.62 ms
所以使用.dot的numpy版本(对不起,我找不到它的文档)是最快的.如果您正在进行大量的数值运算(看起来是这样),那么在创建它们时,可能需要将字段和权重转换为numpy数组.
update_weights
Numpy可能会为update_weights提供类似的加速,例如:
def update_weights(weights, fields, to_increase, to_decrease):
weights[to_increase,:] += fields
weights[to_decrease,:] -= fields
return weights
(我没有测试或描述过顺便说一句,你需要这样做.)