I'm trying to create a numpy array consisting of all possible asset allocations using itertools.product.
The conditions are that allocations for each asset can be in range of zero to 100% and can rise by (100% / number of assets) increments. The allocations total sum should be 100%.
The calculations take very long time when assets number grows (10 seconds for 7 assets, 210 seconds for 8 assets and so on).
Is there a way to speed up the code somehow?
Maybe i should try using it.takewhile or multiprocessing?
import itertools as it
import numpy as np
def CreateMatrix(Increments):
inputs = it.product(np.arange(0, 1 + Increments, Increments), repeat = int(1/Increments));
matrix = np.ndarray((1, int(1/Increments)));
x = 0;
for i in inputs:
if np.sum(i, axis = 0) == 1:
if x > 0:
matrix = np.r_[matrix, np.ndarray((1, int(1/Increments)))]
matrix[x] = i
x = x + 1
return matrix
Assets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Increments = 1.0 / len(Assets)
matrix = CreateMatrix(Increments);
print matrix
解决方案
Use the stdlib sum instead of numpy.sum. This code spends most of its time computing that sum, according to cProfile.
Profiling Code
import cProfile, pstats, StringIO
import itertools as it
import numpy as np
def CreateMatrix(Increments):
inputs = it.product(np.arange(0, 1 + Increments, Increments), repeat = int(1/Increments));
matrix = np.ndarray((1, int(1/Increments)));
x = 0
for i in inputs:
if np.sum(i, axis=0) == 1:
if x > 0:
matrix = np.r_[matrix, np.ndarray((1, int(1/Increments)))]
matrix[x] = i
x += 1
return matrix
pr = cProfile.Profile()
pr.enable()
Assets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Increments = 1.0 / len(Assets)
matrix = CreateMatrix(Increments);
print matrix
pr.disable()
s = StringIO.StringIO()
sortby = 'cumulative'
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print s.getvalue()
Truncated Output
301565912 function calls (301565864 primitive calls) in 294.255 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 26.294 26.294 294.254 294.254 product.py:7(CreateMatrix)
43046721 41.948 0.000 267.762 0.000 Library/Python/2.7/lib/python/site-packages/numpy/core/fromnumeric.py:1966(sum)
43046723 60.071 0.000 217.863 0.000 Library/Python/2.7/lib/python/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
43046723 124.341 0.000 124.341 0.000 {method 'reduce' of 'numpy.ufunc' objects}
43046723 14.630 0.000 14.630 0.000 Library/Python/2.7/lib/python/site-packages/numpy/core/fromnumeric.py:70()
43046721 12.629 0.000 12.629 0.000 {getattr}
43098200 7.958 0.000 7.958 0.000 {isinstance}
43046724 6.191 0.000 6.191 0.000 {method 'items' of 'dict' objects}
6434 0.047 0.000 0.199 0.000 Library/Python/2.7/lib/python/site-packages/numpy/lib/index_tricks.py:316(__getitem__)
Timing Experiments
numpy.sum
import itertools as it
import numpy as np
def CreateMatrix(Increments):
inputs = it.product(np.arange(0, 1 + Increments, Increments), repeat = int(1/Increments));
matrix = np.ndarray((1, int(1/Increments)));
x = 0;
for i in inputs:
if np.sum(i, axis = 0) == 1:
if x > 0:
matrix = np.r_[matrix, np.ndarray((1, int(1/Increments)))]
matrix[x] = i
x = x + 1
return matrix
Assets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Increments = 1.0 / len(Assets)
matrix = CreateMatrix(Increments);
$ python -m timeit --number=3 --verbose "$(cat product.py)"
raw times: 738 696 697
3 loops, best of 3: 232 sec per loop
Stdlib sum
import itertools as it
import numpy as np
def CreateMatrix(Increments):
inputs = it.product(np.arange(0, 1 + Increments, Increments), repeat = int(1/Increments));
matrix = np.ndarray((1, int(1/Increments)));
x = 0;
for i in inputs:
if sum(i) == 1:
if x > 0:
matrix = np.r_[matrix, np.ndarray((1, int(1/Increments)))]
matrix[x] = i
x = x + 1
return matrix
Assets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Increments = 1.0 / len(Assets)
matrix = CreateMatrix(Increments);
$ python -m timeit --number=3 --verbose "$(cat product.py)"
raw times: 90.5 84.3 85.3
3 loops, best of 3: 28.1 sec per loop
There are many more ways to get your solution faster, as other folks have said in their comments. Take a look at How do I "multi-process" the itertools product module? for an idea of how to use multiprocessing to speed this up. No matter what you do: clever algorithm, concurrency or both, replace the sum function; it's a lot of speed up for very little effort.