python memoryview,Python中的memoryview的确切含义是什么

最新推荐文章于 2023-07-17 11:24:44 发布

powerelectricdog

最新推荐文章于 2023-07-17 11:24:44 发布

阅读量392

点赞数

文章标签： python memoryview

本文探讨了Python中memoryview对象的使用方式及其在处理大型数据集时的优势。memoryview允许在不复制数据的情况下访问缓冲区协议支持的对象内部数据。通过对比memoryview与传统bytes类型的性能差异，展示了memoryview在进行大量切片操作时的高效性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Checking the documentation on memoryview:

memoryview objects allow Python code to access the internal data of an

object that supports the buffer protocol without copying.

class memoryview(obj)

Create a memoryview that references obj. obj must support the

buffer protocol. Built-in objects that support the buffer protocol

include bytes and bytearray.

Then we are given the sample code:

>>> v = memoryview(b'abcefg')

>>> v[1]

>>> v[-1]

103

>>> v[1:4]

>>> bytes(v[1:4])

b'bce'

Quotation over, now lets take a closer look:

>>> b = b'long bytes stream'

>>> b.startswith(b'long')

True

>>> v = memoryview(b)

>>> vsub = v[5:]

>>> vsub.startswith(b'bytes')

Traceback (most recent call last):

File "", line 1, in

AttributeError: 'memoryview' object has no attribute 'startswith'

>>> bytes(vsub).startswith(b'bytes')

True

>>>

So what I gather from the above:

We create a memoryview object to expose the internal data of a buffer object without

copying, however, in order to do anything useful with the object (by calling the methods

provided by the object), we have to create a copy!

Usually memoryview (or the old buffer object) would be needed when we have a large object,

and the slices can be large too. The need for a better efficiency would be present

if we are making large slices, or making small slices but a large number of times.

With the above scheme, I don't see how it can be useful for either situation, unless

someone can explain to me what I'm missing here.

Edit1:

We have a large chunk of data, we want to process it by advancing through it from start to

end, for example extracting tokens from the start of a string buffer until the buffer is consumed.In C term, this is advancing a pointer through the buffer, and the pointer can be passed

to any function expecting the buffer type. How can something similar be done in python?

People suggest workarounds, for example many string and regex functions take position

arguments that can be used to emulate advancing a pointer. There're two issues with this: first

it's a work around, you are forced to change your coding style to overcome the shortcomings, and

second: not all functions have position arguments, for example regex functions and startswith do, encode()/decode() don't.

Others might suggest to load the data in chunks, or processing the buffer in small

segments larger than the max token. Okay so we are aware of these possible

workarounds, but we are supposed to work in a more natural way in python without

trying to bend the coding style to fit the language - aren't we?

Edit2:

A code sample would make things clearer. This is what I want to do, and what I assumed memoryview would allow me to do at first glance. Lets use pmview (proper memory view) for the functionality I'm looking for:

tokens = []

xlarge_str = get_string()

xlarge_str_view = pmview(xlarge_str)

while True:

token = get_token(xlarge_str_view)

if token:

xlarge_str_view = xlarge_str_view.vslice(len(token))

# vslice: view slice: default stop paramter at end of buffer

tokens.append(token)

else:

break

解决方案

One reason memoryviews are useful is because they can be sliced without copying the underlying data, unlike bytes/str.

For example, take the following toy example.

import time

for n in (100000, 200000, 300000, 400000):

data = 'x'*n

start = time.time()

b = data

while b:

b = b[1:]

print 'bytes', n, time.time()-start

for n in (100000, 200000, 300000, 400000):

data = 'x'*n

start = time.time()

b = memoryview(data)

while b:

b = b[1:]

print 'memoryview', n, time.time()-start

On my computer, I get

bytes 100000 0.200068950653

bytes 200000 0.938908100128

bytes 300000 2.30898690224

bytes 400000 4.27718806267

memoryview 100000 0.0100269317627

memoryview 200000 0.0208270549774

memoryview 300000 0.0303030014038

memoryview 400000 0.0403470993042

You can clearly see quadratic complexity of the repeated string slicing. Even with only 400000 iterations, it's already unmangeable. Meanwhile, the memoryview version has linear complexity and is lightning fast.