更新:
您有错误拼写的data_columns参数:data_column-应该是data_columns.结果,您在HDF商店中没有任何索引列,并且HDF商店添加了values_block_X:
In [70]: store = pd.HDFStore(r'D:\temp\.data\my_test.h5')
拼写错误的参数将被忽略:
In [71]: store.append('no_idx_wrong_dc', df, data_column=df.columns, index=False)
In [72]: store.get_storer('no_idx_wrong_dc').table
Out[72]:
/no_idx_wrong_dc/table (Table(10,)) ''
description := {
"index": Int64Col(shape=(), dflt=0, pos=0),
"values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),
"values_block_1": Int64Col(shape=(1,), dflt=0, pos=2),
"values_block_2": StringCol(itemsize=30, shape=(1,), dflt=b'', pos=3)}
byteorder := 'little'
chunkshape := (1213,)
与以下内容相同:
In [73]: store.append('no_idx_no_dc', df, index=False)
In [74]: store.get_storer('no_idx_no_dc').table
Out[74]:
/no_idx_no_dc/table (Table(10,)) ''
description := {
"index": Int64Col(shape=(), dflt=0, pos=0),
"values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),
"values_block_1": Int64Col(shape=(1,), dflt=0, pos=2),
"values_block_2": StringCol(itemsize=30, shape=(1,), dflt=b'', pos=3)}
byteorder := 'little'
chunkshape := (1213,)
让我们正确拼写:
In [75]: store.append('no_idx_dc', df, data_columns=df.columns, index=False)
In [76]: store.get_storer('no_idx_dc').table
Out[76]:
/no_idx_dc/table (Table(10,)) ''
description := {
"index": Int64Col(shape=(), dflt=0, pos=0),
"value": Float64Col(shape=(), dflt=0.0, pos=1),
"count": Int64Col(shape=(), dflt=0, pos=2),
"s": StringCol(itemsize=30, shape=(), dflt=b'', pos=3)}
byteorder := 'little'
chunkshape := (1213,)
老答案:
可以仅在第一个附加项上有效地设置min_itemsize参数.
演示:
In [33]: df
Out[33]:
num s
0 11 aaaaaaaaaaaaaaaa
1 12 bbbbbbbbbbbbbb
2 13 ccccccccccccc
3 14 ddddddddddd
In [34]: store = pd.HDFStore(r'D:\temp\.data\my_test.h5')
In [35]: store.append('test_1', df, data_columns=True)
In [36]: store.get_storer('test_1').table.description
Out[36]:
{
"index": Int64Col(shape=(), dflt=0, pos=0),
"num": Int64Col(shape=(), dflt=0, pos=1),
"s": StringCol(itemsize=16, shape=(), dflt=b'', pos=2)}
In [37]: df.loc[4] = [15, 'X'*200]
In [38]: df
Out[38]:
num s
0 11 aaaaaaaaaaaaaaaa
1 12 bbbbbbbbbbbbbb
2 13 ccccccccccccc
3 14 ddddddddddd
4 15 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
In [39]: store.append('test_1', df, data_columns=True)
...
skipped
...
ValueError: Trying to store a string with len [200] in [s] column but
this column has a limit of [16]!
Consider using min_itemsize to preset the sizes on these columns
现在使用min_itemsize,但仍附加到现有商店对象:
In [40]: store.append('test_1', df, data_columns=True, min_itemsize={'s':250})
...
skipped
...
ValueError: Trying to store a string with len [250] in [s] column but
this column has a limit of [16]!
Consider using min_itemsize to preset the sizes on these columns
如果我们将在商店中创建一个新对象,则可以进行以下操作:
In [41]: store.append('test_2', df, data_columns=True, min_itemsize={'s':250})
检查列大小:
In [42]: store.get_storer('test_2').table.description
Out[42]:
{
"index": Int64Col(shape=(), dflt=0, pos=0),
"num": Int64Col(shape=(), dflt=0, pos=1),
"s": StringCol(itemsize=250, shape=(), dflt=b'', pos=2)}