python列表中存类对象,将Python对象存储在Python列表与固定长度的Numpy数组中

In doing some bioinformatics work, I've been pondering the ramifications of storing object instances in a Numpy array rather than a Python list, but in all the testing I've done the performance was worse in every instance. I am using CPython. Does anyone know the reason why?

Specifically:

What are the performance impacts of using a fixed-length array numpy.ndarray(dtype=object) vs. a regular Python list? Initial tests I performed showed that accessing the Numpy array elements was slower than iteration through the Python list, especially when using object methods.

Why is it faster to instantiate objects using a list comprehension such as [ X() for i in range(n) ] instead of a numpy.empty(size=n, dtype=object)?

What is the memory overhead of each? I was not able to test this. My classes extensively use __slots__, if that has any impact.

解决方案

Don't use object arrays in numpy for things like this.

They defeat the basic purpose of a numpy array, and while they're useful in a tiny handful of situations, they're almost always a poor choice.

Yes, accessing an individual element of a numpy array in python or iterating through a numpy array in python is slower than the equivalent operation with a list. (Which is why you should never do something like y = [item * 2 for item in x] when x is a numpy array.)

Numpy object arrays will have a slightly lower memory overhead than a list, but if you're storing that many individual python objects, you're going to run into other memory problems first.

Numpy is first and foremost a memory-efficient, multidimensional array container for uniform numerical data. If you want to hold arbitrary objects in a numpy array, you probably want a list, instead.

My point is that if you want to use numpy effectively, you may need to re-think how you're structuring things.

Instead of storing each object instance in a numpy array, store your numerical data in a numpy array, and if you need separate objects for each row/column/whatever, store an index into that array in each instance.

This way you can operate on the numerical arrays quickly (i.e. using numpy instead of list comprehensions).

As a quick example of what I'm talking about, here's a trivial example without using numpy:

from random import random

class PointSet(object):

def __init__(self, numpoints):

self.points = [Point(random(), random()) for _ in xrange(numpoints)]

def update(self):

for point in self.points:

point.x += random() - 0.5

point.y += random() - 0.5

class Point(object):

def __init__(self, x, y):

self.x = x

self.y = y

points = PointSet(100000)

point = points.points[10]

for _ in xrange(1000):

points.update()

print 'Position of one point out of 100000:', point.x, point.y

And a similar example using numpy arrays:

import numpy as np

class PointSet(object):

def __init__(self, numpoints):

self.coords = np.random.random((numpoints, 2))

self.points = [Point(i, self.coords) for i in xrange(numpoints)]

def update(self):

"""Update along a random walk."""

# The "+=" is crucial here... We have to update "coords" in-place, in

# this case.

self.coords += np.random.random(self.coords.shape) - 0.5

class Point(object):

def __init__(self, i, coords):

self.i = i

self.coords = coords

@property

def x(self):

return self.coords[self.i,0]

@property

def y(self):

return self.coords[self.i,1]

points = PointSet(100000)

point = points.points[10]

for _ in xrange(1000):

points.update()

print 'Position of one point out of 100000:', point.x, point.y

There are other ways to do this (you may want to avoid storing a reference to a specific numpy array in each point, for example), but I hope it's a useful example.

Note the difference in speed at which they run. On my machine, it's a difference of 5 seconds for the numpy version vs 60 seconds for the pure-python version.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值