python 内存不足 dict 替代方案_2D数组代表一个巨大的python dict,COOrdinate就像解决方案来节省内存...

最新推荐文章于 2024-02-07 10:15:00 发布

简明杰

最新推荐文章于 2024-02-07 10:15:00 发布

阅读量203

点赞数

文章标签： python 内存不足 dict 替代方案

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_31967551/article/details/114395372

版权

我尝试用数组中的数据更新dict_with_tuples_key：

myarray = np.array([[0, 0], # 0, 1

[0, 1],

[1, 1], # 1, 2

[1, 2], # 1, 3

[2, 2],

[1, 3]]

) # a lot of this with shape~(10e6, 2)

dict_with_tuples_key = {(0, 1): 1,

(3, 7): 1} # ~10e6 keys

使用数组来存储dict值,(感谢@MSeifert)我们得到了这个：

def convert_dict_to_darray(dict_with_tuples_key, myarray):

idx_max_array = np.max(myarray, axis=0)

idx_max_dict = np.max(dict_with_tuples_key.keys(), axis=0)

lens = np.max([list(idx_max_array), list(idx_max_dict)], axis=0)

xlen, ylen = lens[0] + 1, lens[1] + 1

darray = np.zeros((xlen, ylen)) # Empty array to hold all indexes in myarray

for key, value in dict_with_tuples_key.items():

darray[key] = value

return darray

@njit

def update_darray(darray, myarray):

elements = myarray.shape[0]

for i in range(elements):

darray[myarray[i][0]][myarray[i][1]] += 1

return darray

def darray_to_dict(darray):

updated_dict = {}

keys = zip(*map(list, np.nonzero(darray)))

for x, y in keys:

updated_dict[(x, y)] = darray[x, y]

return updated_dict

darray = convert_dict_to_darray(dict_with_tuples_key, myarray)

darray = update_darray(darray, myarray)

我得到了所需的确切结果：

# print darray_to_dict(darray)

# {(0, 1): 2.0,

# (0, 0): 1.0,

# (1, 1): 1.0,

# (2, 2): 1.0,

# (1, 2): 1.0,

# (1, 3): 1.0,

# (3, 7): 1.0, }

对于小矩阵,它的工作状态很好,@ njit可以在它上面工作,所以速度非常快,

但…

巨大的空darray = np.zeros((xlen,ylen))的创建不适合记忆.我们如何避免分配一个非常稀疏的数组,并且只在COOrdinate格式中存储非空值(如稀疏矩阵)？

最佳答案使用来自scipy的dok_matrix; dock_matrix是基于密钥的稀疏矩阵的字典.它们允许您逐步构建稀疏矩阵,并且它们不会分配不适合您的计算机内存的巨大的空darray = np.zeros((xlen,ylen)).

唯一要做的更改是从scipy导入正确的模块,并在函数convert_dict_to_darray中更改darray的定义.

它看起来像这样：

from scipy.sparse import dok_matrix

def convert_dict_to_darray(dict_with_tuples_key, myarray):

idx_max_array = np.max(myarray, axis=0)

idx_max_dict = np.max(dict_with_tuples_key.keys(), axis=0)

lens = np.max([list(idx_max_array), list(idx_max_dict)], axis=0)

xlen, ylen = lens[0] + 1, lens[1] + 1

darray = dok_matrix( (xlen, ylen) )

for key, value in dict_with_tuples_key.items():

darray[key[0], key[1]] = value

return darray

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 内存不足 dict 替代方案_2D数组代表一个巨大的python dict,COOrdinate就像解决方案来节省内存...

我尝试用数组中的数据更新dict_with_tuples_key：myarray = np.array([[0, 0], # 0, 1[0, 1],[1, 1], # 1, 2[1, 2], # 1, 3[2, 2],[1, 3]]) # a lot of this with shape~(10e6, 2)dict_with_tuples_key = {(0, 1): 1,(3, 7): 1...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。