1.数据结构
NetCDF(network Common Data Form)网络通用数据格式包括变量、维和属性。通量数据RDMF_2011_L3.nc可利用软件Panoply进行可视化,如下图所示:
2.数据读取
Pyton读取nc数据,现在大部分的方法利用netCDF4包的Dataset方法读取文件,但nc格式的通量数据则无法利用Dataset读取出变量的值。
import pandas as pd
import os
from netCDF4 import Dataset
# 获取nc文件的内部信息
nc =Dataset(r'D:\NC_files\fifth_page\Red Dirt Melon Farm OzFlux tower site\RDMF_2011_L3.nc')
# 获取RDMF_2011_L3.nc中所有的变量
vars=nc.variables.keys()
print(vars)
输出结果:odict_keys([‘Ah’, ‘Ah_QCFlag’, ‘Cc’, ‘Cc_QCFlag’, ‘Day’, ‘Day_QCFlag’, ‘Fa’, ‘Fa_QCFlag’, ‘Fc’, ‘Fc_QCFlag’, ‘Fe’, ‘Fe_QCFlag’, ‘Fg’, ‘Fg_QCFlag’, ‘Fh’, ‘Fh_QCFlag’, ‘Fld’, ‘Fld_QCFlag’, ‘Flu’, ‘Flu_QCFlag’, ‘Fm’, ‘Fm_QCFlag’, ‘Fn’, ‘Fn_QCFlag’, ‘Fsd’, ‘Fsd_QCFlag’, ‘Fsu’, ‘Fsu_QCFlag’, ‘Hdh’, ‘Hdh_QCFlag’, ‘Hour’, ‘Hour_QCFlag’, ‘Minute’, ‘Minute_QCFlag’, ‘Month’, ‘Month_QCFlag’, ‘Precip’, ‘Precip_QCFlag’, ‘Second’, ‘Second_QCFlag’, ‘Sws’, ‘Sws_QCFlag’, ‘Sws_50cm’, ‘Sws_50cm_QCFlag’, ‘Sws_5cm’, ‘Sws_5cm_QCFlag’, ‘Ta’, ‘Ta_QCFlag’, ‘Ts’, ‘Ts_QCFlag’, ‘Wd_CSAT’, ‘Wd_CSAT_QCFlag’, ‘Ws_CSAT’, ‘Ws_CSAT_QCFlag’, ‘Year’, ‘Year_QCFlag’, ‘eta’, ‘eta_QCFlag’, ‘ps’, ‘ps_QCFlag’, ‘theta’, ‘theta_QCFlag’, ‘ustar’, ‘ustar_QCFlag’, ‘xlDateTime’, ‘xlDateTime_QCFlag’])
import pandas as pd
import os
from netCDF4 import Dataset
# 获取nc文件的内部信息
nc =Dataset(r'D:\NC_files\fifth_page\Red Dirt Melon Farm OzFlux tower site\RDMF_2011_L3.nc')
# 获取RDMF_2011_L3.nc中所有的变量
vars=nc.variables.keys()
for var in vars:
#读取每个变量的值
var_data=nc.variables[var][:].data
print(var,var_data)
输出结果:Traceback (most recent call last):
File “E:/05study/Pycode/Python/TSTLProject/test.py”, line 52, in
var_data=nc.variables[var][:].data
ValueError: could not convert string to float: '0,50’
根据以上代码无法读取变量的值,出现错误:ValueError: could not convert string to float: ‘0,50’,其原因我猜可能是变量的属性描述中的valid_range = “0,50”,如下图所示,如果哪位知道真实原因,欢迎评论交流。
经过多次尝试,我们引入了gdal包进行通量数据的读取,以读取变量Ah为例,其他变量都是如此,代码如下:
import pandas as pd
import os
from osgeo import gdal
#RDMF_2011_L3.nc文件路径
dir=r'D:\NC_files\fifth_page\Red Dirt Melon Farm OzFlux tower site\RDMF_2011_L3.nc'
#打开RDMF_2011_L3.nc中的Ah变量
variable = gdal.Open('NETCDF:' + dir + ':' + 'Ah')
# 获取变量值,并按行的方式将多维数组变成一维
variable_value = variable.ReadAsArray().flatten('C')
print(variable_value)
输出结果:[ 9.884751e+00 -9.999000e+03 -9.999000e+03 … 1.499470e+01 1.622357e+01 1.678135e+01],数据格式为numpy数组。
3.变量值写入CSV
import pandas as pd
import os
from osgeo import gdal
from netCDF4 import Dataset
#RDMF_2011_L3.nc文件路径
dir=r'D:\NC_files\fifth_page\Red Dirt Melon Farm OzFlux tower site\RDMF_2011_L3.nc'
# 获取nc文件的内部变量
nc = Dataset(dir)
# 定义一个DataFrame()存储变量值
df = pd.DataFrame()
#循环获取nc中的各个变量,并且把变量的值读出
for var in nc.variables.keys():
variable = gdal.Open('NETCDF:' + dir + ':' + var)
# 获取变量值,并按行的方式将多维数组变成一维
variable_value = variable.ReadAsArray().flatten('C')
# 将变量和值写入到DataFrame中
df[var] = pd.Series(variable_value)
#将DataFrame中的变量值写入到test.csv中
df.to_csv('test.csv', encoding='utf-8', index=False)
4.结果