hdf5文件和csv的区别
In my last article, I discussed the steps to download NASA data from GES DISC. The data files downloaded are in the HDF5 format. HDF5 is a file format, a technology, that enables the management of very large data collections. Thus, it is quite popular for storing information. For getting NASA’s data, please check the below article first:
在上一篇文章中,我讨论了从GES DISC下载NASA数据的步骤。 下载的数据文件为HDF5格式。 HDF5是一种文件格式,一种技术,可以管理非常大的数据集。 因此,在存储信息方面非常流行。 要获取NASA的数据,请先查看以下文章:
Whenever I work with datasets, I’m most comfortable with CSV files. Thus, once I got the HDF5 files, I decided to look for ways to change them to CSV files. I found the package h5py
in Python, which enables the reading in of HDF5 files. Thus, this article articulates the steps to use h5py
and convert HDF5 to CSV. You can follow along by referring to the complete notebook at the link below.
每当我使用数据集时,我对CSV文件都很满意。 因此,一旦获得HDF5文件,我便决定寻找将其更改为CSV文件的方法。 我在Python中找到了h5py
软件包,该软件包可以读取HDF5文件。 因此,本文阐述了使用h5py
并将HDF5转换为CSV的步骤。 您可以通过以下链接参考完整的笔记本。
导入库 (Import libraries)
For this work, we’ll require two libraries. The first library is h5py
which has the option to read and work with HDF5 files (documentation). The second package we need is numpy
to work with arrays. Finally, we will import pandas
so we can create a dataframe and later save it as a CSV file.
对于这项工作,我们将需要两个库。 第一个库是h5py
,它具有读取和使用HDF5文件( 文档 )的选项。 我们需要的第二个包是使用numpy
来处理数组。 最后,我们将导入pandas
以便我们可以创建一个数据框,然后将其另存为CSV文件。
import h5py
import numpy as np
import pandas as pd
加载数据集 (Load dataset)
The next step is to load in the HDF5 file. Note that for this example, I’m working with GPM data collected from GES DISC for January, 2020 for the whole world. It’s located inside the data folder in the GitHub repo (downloaded from GES DISC website).
下一步是加