使用Python读取多重h5文件并保存为csv

仌三语

已于 2023-11-18 08:58:19 修改

阅读量520

点赞数

文章标签： python

于 2023-11-18 08:57:47 首次发布

本文链接：https://blog.csdn.net/m0_46440994/article/details/134473920

版权

定义一个函数hdf52csv来遍历HDF5文件的所有组和数据集。如果路径指向一个组，则递归遍历该组的所有子项。如果路径指向一个数据集，则将数据集保存为CSV文件。

main函数打开指定的HDF5文件，然后使用线程池执行器来并行处理各个文件。

import os
import h5py
import pandas as pd
from concurrent.futures import ThreadPoolExecutor, as_completed

def hdf52csv(f, path='.', executor=None):
    print(f[path].name,'\n',f[path])
    # 如果是group，继续遍历
    if isinstance(f[path], h5py.Group):
        futures = []
        for key in f[path].keys():
            future = executor.submit(hdf52csv, f, path + '/' + key, executor)
            futures.append(future)
        for future in as_completed(futures):
            future.result()
    # 如果是dataset，保存为csv文件
    else:
        if not os.path.exists(os.path.dirname(path)):
            os.makedirs(os.path.dirname(path))
        data = f[f[path].name][:]
        if data.dtype.char == 'S':  # 如果数据类型是bytes
            data = data.astype(str)  # 转换为str
        pd.DataFrame(data).to_csv(path + ".csv", index=False)

def main(file_path):
    # 将h5所在路径设为当前工作路径，也是csv保存路径
    os.chdir(os.path.dirname(file_path))
    with h5py.File(file_path,'r')as f:
        with ThreadPoolExecutor() as executor:
            hdf52csv(f, executor=executor)

if __name__ == "__main__":
    main(r'h5文件路径')

仌三语

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
使用Python读取多重h5文件并保存为csv

定义一个函数hdf52csv来遍历HDF5文件的所有组和数据集。如果路径指向一个组，则递归遍历该组的所有子项。如果路径指向一个数据集，则将数据集保存为CSV文件。main函数打开指定的HDF5文件，然后使用线程池执行器来并行处理各个文件。
复制链接

扫一扫