I have 2 decades of spatially variable wind data recorded at six-hourly intervals. I need to average the 2 decades of data across each six-hourly time interval, so I end up with 365 * 4 time steps. The data is in netcdf format.
Here's what the data looks like:
import xarray as xr
filename = 'V-01011999-01012019.nc'
ds = xr.open_dataset(filename)
print(ds)
Dimensions: (lat: 8, lon: 7, time: 29221)
Coordinates:
* lat (lat) float32 -2.5 -5.0 -7.5 -10.0 -12.5 -15.0 -17.5 -20.0
* lon (lon) float32 130.0 132.5 135.0 137.5 140.0 142.5 145.0
* time (time) datetime64[ns] 1999-01-01 1999-01-01T06:00:00 .. 2019-01-01
Data variables:
vwnd (time, lat, lon) float32 ...
#remove feb 29 from records
ds = ds.sel(time=~((ds.time.dt.month == 2) & (ds.time.dt.day == 29)))
I have been able to group by day of year to get a 2 decadal average of the day of year.
tsavg = ds.groupby('time.dayofyear').mean('time')
print(tsavg)
Dimensions: (dayofyear: 366, lat: 8, lon: 7)
Coordinates:
* lat (lat) float32 -2.5 -5.0 -7.5 -10.0 -12.5 -15.0 -17.5 -20.0
* lon (lon) float32 130.0 132.5 135.0 137.5 140.0 142.5 145.0
* dayofyear (dayofyear) int64 1 2 3 4 5 6 7 8 ... 360 361 362 363 364 365 366
Data variables:
vwnd (dayofyear, lat, lon) float32 -2.61605 -1.49012 ... -0.959997
What I really want is a time coordinate of length 365 * 4 (4 x 6 hr intervals in a day) with each time step being an average over the past 20 years for that time step.
Also, for some reason tsavg.dayofyear length is still 366 even though I deleted Feb 29th.
I couldn't apply or follow the answers from this post.
I have studied the groupby resources extensively and tried so many things but I can't figure it out. I'm looking for some help with the coding.
解决方案
Indeed there is not a very well documented way of doing this. Note also that dayofyear may not be exactly what you expect it to be.
In lieu of being able to use groupby with multiple levels (e.g. see this answer regarding how to do something similar to what you are asking in pandas), which is not available yet in xarray, a reasonably clean way of solving this kind of problem is to define a new coordinate for grouping that represents the "time of year" for each time in your Dataset.
In your case you are looking to group by the "hour of the year" (i.e. matching month, day, and hour). For this you can create an array of strings, which are basically just the string representations of the dates in the time coordinate with the years dropped:
ds['hourofyear'] = xr.DataArray(ds.indexes['time'].strftime('%m-%d %H'), coords=ds.time.coords)
result = ds.groupby('hourofyear').mean('time')