python itemsize_python-HDF5 min_itemsize错误：ValueError：试图在...-CSDN博客

本文链接：https://blog.csdn.net/weixin_39611049/article/details/111522927

本文介绍了在使用Python处理HDF5文件时遇到的min_itemsize错误，详细解析了错误原因，并提供了如何正确设置min_itemsize参数以避免字符串长度超出限制的问题。通过示例代码展示了如何在首次附加数据时有效设置min_itemsize，以及在已有存储对象中更新min_itemsize的限制。

摘要由CSDN通过智能技术生成

更新：

您有错误拼写的data_columns参数：data_column-应该是data_columns.结果,您在HDF商店中没有任何索引列,并且HDF商店添加了values_block_X：

In [70]: store = pd.HDFStore(r'D:\temp\.data\my_test.h5')

拼写错误的参数将被忽略：

In [71]: store.append('no_idx_wrong_dc', df, data_column=df.columns, index=False)

In [72]: store.get_storer('no_idx_wrong_dc').table

Out[72]:

/no_idx_wrong_dc/table (Table(10,)) ''

description := {

"index": Int64Col(shape=(), dflt=0, pos=0),

"values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),

"values_block_1": Int64Col(shape=(1,), dflt=0, pos=2),

"values_block_2": StringCol(itemsize=30, shape=(1,), dflt=b'', pos=3)}

byteorder := 'little'

chunkshape := (1213,)

与以下内容相同：

In [73]: store.append('no_idx_no_dc', df, index=False)

In [74]: store.get_storer('no_idx_no_dc').table

Out[74]:

/no_idx_no_dc/table (Table(10,)) ''

description := {

"index": Int64Col(shape=(), dflt=0, pos=0),

"values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),

"values_block_1": Int64Col(shape=(1,), dflt=0, pos=2),

"values_block_2": StringCol(itemsize=30, shape=(1,), dflt=b'', pos=3)}

byteorder := 'little'

chunkshape := (1213,)

让我们正确拼写：

In [75]: store.append('no_idx_dc', df, data_columns=df.columns, index=False)

In [76]: store.get_storer('no_idx_dc').table

Out[76]:

/no_idx_dc/table (Table(10,)) ''

description := {

"index": Int64Col(shape=(), dflt=0, pos=0),

"value": Float64Col(shape=(), dflt=0.0, pos=1),

"count": Int64Col(shape=(), dflt=0, pos=2),

"s": StringCol(itemsize=30, shape=(), dflt=b'', pos=3)}

byteorder := 'little'

chunkshape := (1213,)

老答案：

可以仅在第一个附加项上有效地设置min_itemsize参数.

演示：

In [33]: df

Out[33]:

num s

0 11 aaaaaaaaaaaaaaaa

1 12 bbbbbbbbbbbbbb

2 13 ccccccccccccc

3 14 ddddddddddd

In [34]: store = pd.HDFStore(r'D:\temp\.data\my_test.h5')

In [35]: store.append('test_1', df, data_columns=True)

In [36]: store.get_storer('test_1').table.description

Out[36]:

{

"index": Int64Col(shape=(), dflt=0, pos=0),

"num": Int64Col(shape=(), dflt=0, pos=1),

"s": StringCol(itemsize=16, shape=(), dflt=b'', pos=2)}

In [37]: df.loc[4] = [15, 'X'*200]

In [38]: df

Out[38]:

num s

0 11 aaaaaaaaaaaaaaaa

1 12 bbbbbbbbbbbbbb

2 13 ccccccccccccc

3 14 ddddddddddd

4 15 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...

In [39]: store.append('test_1', df, data_columns=True)

...

skipped

...

ValueError: Trying to store a string with len [200] in [s] column but

this column has a limit of [16]!

Consider using min_itemsize to preset the sizes on these columns

现在使用min_itemsize,但仍附加到现有商店对象：

In [40]: store.append('test_1', df, data_columns=True, min_itemsize={'s':250})

...

skipped

...

ValueError: Trying to store a string with len [250] in [s] column but

this column has a limit of [16]!

Consider using min_itemsize to preset the sizes on these columns

如果我们将在商店中创建一个新对象,则可以进行以下操作：

In [41]: store.append('test_2', df, data_columns=True, min_itemsize={'s':250})

检查列大小：

In [42]: store.get_storer('test_2').table.description

Out[42]:

{

"index": Int64Col(shape=(), dflt=0, pos=0),

"num": Int64Col(shape=(), dflt=0, pos=1),

"s": StringCol(itemsize=250, shape=(), dflt=b'', pos=2)}