如果您不需要列表本身,那么我完全赞同@ Lattyware使用生成器的解决方案.
但是,如果这不是一个选项,那么您可以通过仅存储文件中每个字符的位置来压缩列表中的数据而不会丢失信息.
import random
import string
def track_char(s):
# Make sure all characters have the same case
s = s.lower()
d = dict((k, []) for k in set(s))
for position, char in enumerate(s):
d[char].append(position)
return d
st = ''.join(random.choice(string.ascii_uppercase) for _ in range(50000))
d = track_char(st)
len(d["a"])
# Total number of occurrences of character 2
for char, vals in d.items():
if 2 in vals:
print("Character %s has %s occurrences" % (char,len(d[char]))
Character C has 1878 occurrences
# Number of occurrences of character 2 so far
for char, vals in d.items():
if 2 in vals:
print("Character %s has %s occurrences so far" % (char, len([x for x in d[char] if x <= 2))
Character C has 1 occurrences so far
这样,每次出现时都不需要复制字符串,并且可以保留所有出现的信息.
要比较原始列表或此方法的对象大小,这是一个测试
import random
import string
from sys import getsizeof
# random generation of a string with 50k characters
st = ''.join(random.choice(string.ascii_uppercase) for _ in range(50000))
# Function that returns the original list for this string
def original_track(s):
l = []
for position, char in enumerate(s):
l.append([char, position])
return l
# Testing sizes
original_list = original_track(st)
dict_format = track_char(st)
getsizeof(original_list)
406496
getsizeof(dict_format)
1632
如您所见,dict_format的大小约为250倍.然而,这种尺寸差异应该在更大的字符串中更明显.