Python | 处理海洋2C 数据 | 非标准时间格式

简朴-ocean

于 2024-08-26 21:07:28 发布

阅读量1k

点赞数 13

分类专栏： python-海洋数据处理文章标签： python 服务器开发语言

本文链接：https://blog.csdn.net/weixin_44237337/article/details/141573341

版权

python-海洋数据处理专栏收录该内容

79 篇文章 236 订阅

订阅专栏

写在前面

最近，师弟在用Python读取某海洋2C数据时，突然冒出一个报错：“时间单位的参考日期无效，当前日期 00:00:00.0”。这让我回想起，似乎在很久很久以前，我处理SMAP和Argo数据时也遇到过类似的问题。为了不让未来的师弟师妹们再次被这种问题困扰，我决定在这里记录一下这个坑。

希望未来再看到这条记录时，能少叹气、多微笑，毕竟已经知道怎么解决了嘛！

问题复现

在正常使用xarray读取nc文件时，会出现以下报错：

import xarray as xr
path= r'I:/H2C_*.nc'
ds  = xr.open_dataset(path)

ValueError: invalid reference date for time units: current day 00:00:00.0

ValueError: unable to decode time units 'secs since current day 00:00:00.0' with 'the default calendar'. 

Try opening your dataset with decode_times=False or installing cftime if it is not installed.`

遇到问题就解决问题嘛，上面的报错信息给的也挺详细，包括解决方法，说让你在打开dataset的时候可以加上’decode_times=False’.试试加上确实可以成功打开了

ds  = xr.open_dataset(path,decode_times=False)

此外，我发现可以直接实现netCDF4进行打开，就不会有上面的错误提示。

import netCDF4 as nc
ds  = nc.Dataset(path)

好了，这是遇到的第一个问题，文件打开时出现的错误。

下面是第二个问题：时间转换异常。

下面我希望将数据的时间转化为datetime的格式，方面后续的处理。运行下面的代码进行转换

path= r'I:/H2C_*.nc'
ds  = xr.open_dataset(path,decode_times=False)
time = ds['time'].values
time_c = pd.to_datetime(time)

虽然代码没有报错，但是他的转换结果是有问题的，因为我已知读取的这个文件时间应该是在2021年。所以这是什么原因导致的转换的时间出现异常呢？

问题解释

在一般的netcdf文件中，对于该文件的时间，会给定一个参考的时间标准，然后从标准进行计算，得到相应的时间序列。

在测绘里面有点类似于85国家高程基准的概念。

怎么找到这个时间从参考日期呢，可以直接在time的变量的属性信息里面找到（一般来说都会写在里面，没写的除非特殊说明）

ds.time
Out[139]: 
<xarray.DataArray 'time' (time: 2355)> Size: 19kB
array([6.818707e+08, 6.818707e+08, 6.818707e+08, ..., 6.818738e+08,
       6.818738e+08, 6.818738e+08])
Coordinates:
  * time     (time) float64 19kB 6.819e+08 6.819e+08 ... 6.819e+08 6.819e+08
    lat      (time) float64 19kB ...
    lon      (time) float64 19kB ...
Attributes:
    long_name:           time (sec since 2000-01-01)
    standard_name:       time
    calendar:            gregorian
    tai_utc_difference:  -37.0
    leap_second:         0000-00-00 00:00:00
    units:               seconds since 2000-01-01 00:00:00.0
    comment:             [tai_utc_difference] is the difference between TAI a...

在Attributes-units可以发现，其参考时间为’seconds since 2000-01-01 00:00:00.0’.下面就可以将这个作为基准时间，然后将原本的时间进行转换，主要使用netCDF4.num2date（）函数

path= r'I:/H2C_*.nc'
ds  = xr.open_dataset(path,decode_times=False)
time = ds['time']
time_array = nc.num2date(time,units='sec since 2000-01-01',calendar=time.calendar)
time_array

转换后时间就正常了：

time_array
Out[150]: 
array([cftime.DatetimeGregorian(2021, 8, 10, 0, 31, 55, 472829, has_year_zero=False),
       cftime.DatetimeGregorian(2021, 8, 10, 0, 31, 56, 727732, has_year_zero=False),
       cftime.DatetimeGregorian(2021, 8, 10, 0, 31, 57, 477053, has_year_zero=False),
       ...,
       cftime.DatetimeGregorian(2021, 8, 10, 1, 23, 57, 42110, has_year_zero=False),
       cftime.DatetimeGregorian(2021, 8, 10, 1, 23, 58, 42587, has_year_zero=False),
       cftime.DatetimeGregorian(2021, 8, 10, 1, 23, 59, 43059, has_year_zero=False)],
      dtype=object)

然后再用pandas.to_datetime()进行相应转换

datetime_array = np.array([datetime.datetime(t.year, t.month, t.day, t.hour, t.minute, t.second, t.microsecond) for t in time_array])
time_convert = pd.to_datetime(datetime_array)

time_convert
Out[154]: 
              ['2021-08-10 00:31:55.472829', '2021-08-10 00:31:56.727732',
               '2021-08-10 00:31:57.477053', '2021-08-10 00:31:59.762333',
               '2021-08-10 00:32:06.855991', '2021-08-10 00:32:07.055824',
               '2021-08-10 00:32:10.808600', '2021-08-10 00:32:11.058392',
               '2021-08-10 00:32:12.207441', '2021-08-10 00:32:15.635650',
               ...
               '2021-08-10 01:23:32.429303', '2021-08-10 01:23:33.429862',
               '2021-08-10 01:23:45.886553', '2021-08-10 01:23:46.086657',
               '2021-08-10 01:23:47.937610', '2021-08-10 01:23:55.791510',
               '2021-08-10 01:23:56.041632', '2021-08-10 01:23:57.042110',
               '2021-08-10 01:23:58.042587', '2021-08-10 01:23:59.043059'],
              dtype='datetime64[ns]', length=2355, freq=None)

可以将上述步骤封装为一个函数，方便对于多个文件进行批量处理：

import xarray as xr
import netCDF4 as nc
import numpy as np
import pandas as pd
import datetime
def convert_nc_time_to_datetime(path):
    """
    将NetCDF文件中的时间变量转换为pandas的Datetime对象。

    参数:
        path (str): NetCDF文件的路径。

    返回:
        pd.DatetimeIndex: 转换后的时间数组。
    """
    # 打开NetCDF文件并读取时间变量
    ds = xr.open_dataset(path, decode_times=False)
    time = ds['time']
    
    # 将时间变量转换为datetime对象数组
    time_array = nc.num2date(time, units=time.units, calendar=time.calendar)
    
    # 将datetime对象数组转换为np.datetime64对象
    datetime_array = np.array([datetime.datetime(t.year, t.month, t.day, t.hour, t.minute, t.second, t.microsecond) for t in time_array])
    
    # 将datetime对象数组转换为pandas的DatetimeIndex
    time_convert = pd.to_datetime(datetime_array)
    
    return time_convert
# 使用示例
path = r'I:/H2C_*.nc'
time_convert = convert_nc_time_to_datetime(path)
print(time_convert)

其他场景应用

了解了nc.num2date这个函数后，可以干什么呢？对于两组数据的units不一致时，如果你要将两组数据进行误差比较，就可以用到这个函数，比如我现在有一组SMAP盐度卫星数据以及一组Argo盐度数据，希望将两组数据进行时间匹配以及误差比较。就可以利用这个函数，将Argo的时间标准('days since 1950-01-01 00:00:00')转换为SMAP的时间标准('seconds since 2000-1-1 00:00:00')：

dt=nc.num2date(time,units='days since 1950-01-01 00:00:00')     # Argo 时间标准
argo_dt=nc.date2num(dt,units='seconds since 2000-1-1 00:00:00') #与smap中的时间对应

关于相关函数的使用可以参考引用的链接

https://www.programcreek.com/python/example/89488/netCDF4.num2date

https://unidata.github.io/netcdf4-python/

https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html