python中diff_Numpy和diff()

我试图为我的排序numpy数组创建一个diff,这样如果我记录第一行的值和diff,就可以重新创建原始表,但是存储的数据更少。在

下面是一个表格的例子:my_array = numpy.array([(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),

(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),

(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2),

(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 34),

(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 35),

(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36)

],'uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8')

跑完之后数字差异(my峎u array)我会预料到这样的情况:

^{pr2}$

Note: The data above comes from the first & last three rows of the

'real' data, which is much much larger. With the full dataset, most of the

rows after a diff would be 0,0,0,0,0,0,0,0,0,0,0,0,1 -- which can a)

be stored in a much smaller struct, and b) will compress fantastically well on disk since most rows contain very similar data.

I should probably point out that the reason I have a whole bunch of uint8's in the first place, is because I needed to store an array of extremely large numbers, in the smallest amount of memory possible. The largest number was 185439173519100986733232011757860, which is too big for uint64. In fact, the smallest number of bits to store it would be 108 bits, or 14 bytes (to the nearest byte). So to fit these large numbers into numpy, i use the following two functions:

def large_number_to_numpy(number,columns):

return tuple((number >> (8*x)) & 255 for x in range(columns-1,-1,-1))

def numpy_to_large_number(numbers):

return sum([y << (8*x) for x,y in enumerate(numbers[::-1])])

Which is used like this:

>>> large_number_to_numpy(185439173519100986733232011757860L,14)

(9L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L)

numpy_to_large_number((9L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L))

185439173519100986733232011757860L

With the array created like this:

my_array = numpy.zeros(TOTAL_ROWS,','.join(14*['uint8']))

And then populated with:

my_array[x] = large_number_to_numpy(large_number,14)

但我得到的却是:>>> my_array

array([(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),

(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),

(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2),

(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 34),

(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 35),

(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36)],

dtype=[('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')])

>>> numpy.diff(my_array)

Traceback (most recent call last):

File "", line 1, in

File "/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1567, in diff

return a[slice1]-a[slice2]

TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype([('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')]) dtype([('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')]) dtype([('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')])

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值