python生成序列,在Python中高效地生成词典序列

最新推荐文章于 2023-10-30 11:01:49 发布

叶洛曦

最新推荐文章于 2023-10-30 11:01:49 发布

阅读量134

点赞数

文章标签： python生成序列

I want to generate a lexicographic series of numbers such that for each number the sum of digits is a given constant. It is somewhat similar to 'subset sum problem'. For example if I wish to generate 4-digit numbers with sum = 3 then I have a series like:

[3 0 0 0]

[2 1 0 0]

[2 0 1 0]

[2 0 0 1]

[1 2 0 0] ... and so on.

I was able to do it successfully in Python with the following code:

import numpy as np

M = 4 # No. of digits

N = 3 # Target sum

a = np.zeros((1,M), int)

b = np.zeros((1,M), int)

a[0][0] = N

jj = 0

while a[jj][M-1] != N:

ii = M-2

while a[jj][ii] == 0:

ii = ii-1

kk = ii

if kk > 0:

b[0][0:kk-1] = a[jj][0:kk-1]

b[0][kk] = a[jj][kk]-1

b[0][kk+1] = N - sum(b[0][0:kk+1])

b[0][kk+2:] = 0

a = np.concatenate((a,b), axis=0)

jj += 1

for ii in range(0,len(a)):

print a[ii]

print len(a)

I don't think it is a very efficient way (as I am a Python newbie). It works fine for small values of M and N (<10) but really slow beyond that. I wish to use it for M ~ 100 and N ~ 6. How can I make my code more efficient or is there a better way to code it?

解决方案

Very effective algorithm adapted from Jorg Arndt book "Matters Computational"

(Chapter 7.2 Co-lexicographic order for compositions into exactly k parts)

n = 4

k = 3

x = [0] * n

x[0] = k

while True:

print(x)

v = x[-1]

if (k==v ):

break

x[-1] = 0

j = -2

while (0==x[j]):

j -= 1

x[j] -= 1

x[j+1] = 1 + v

[3, 0, 0, 0]

[2, 1, 0, 0]

[2, 0, 1, 0]

[2, 0, 0, 1]

[1, 2, 0, 0]

[1, 1, 1, 0]

[1, 1, 0, 1]

[1, 0, 2, 0]

[1, 0, 1, 1]

[1, 0, 0, 2]

[0, 3, 0, 0]

[0, 2, 1, 0]

[0, 2, 0, 1]

[0, 1, 2, 0]

[0, 1, 1, 1]

[0, 1, 0, 2]

[0, 0, 3, 0]

[0, 0, 2, 1]

[0, 0, 1, 2]

[0, 0, 0, 3]

Number of compositions and time on seconds for plain Python (perhaps numpy arrays are faster) for n=100, and k = 2,3,4,5 (2.8 ghz Cel-1840)

2 5050 0.040000200271606445

3 171700 0.9900014400482178

4 4421275 20.02204465866089

5 91962520 372.03577995300293

I expect time 2 hours for 100/6 generation

Same with numpy arrays (x = np.zeros((n,), dtype=int)) gives worse results - but perhaps because I don't know how to use them properly

2 5050 0.07999992370605469

3 171700 2.390003204345703

4 4421275 54.74532389640808

Native code (this is Delphi, C/C++ compilers might optimize better) generates 100/6 in 21 seconds

3 171700 0.012

4 4421275 0.125

5 91962520 1.544

6 1609344100 20.748

Cannot go sleep until all measurements aren't done :)

MSVS VC++: 18 seconds! (O2 optimization)

5 91962520 1.466

6 1609344100 18.283

So 100 millions variants per second.

A lot of time is wasted for checking of empty cells (because fill ratio is small). Speed described by Arndt is reached on higher k/n ratios and is about 300-500 millions variants per second:

n=25, k=15 25140840660 60.981 400 millions per second