Xarray在大气科学中的应用(最基础篇)

一航亦航

已于 2024-06-02 10:51:30 修改

阅读量904

点赞数 26

文章标签： python

于 2024-06-01 20:32:26 首次发布

本文链接：https://blog.csdn.net/lvyihang200411/article/details/139370654

版权

Xarray简介

一、Xarray的气象栅格数据处理

Xarray十分适用于nc文件的读取，因此在大气科学研究中具有举足轻重的作用。Xarray相较于pandas和numpy的优点是Xarray更加对象化，用户化，在这篇文章中我会向大家展现Xarray的强大功能的。

二、Xarray中的Dataarray和Dataset

Dataset是数据集，由对齐维度的Dataarray对象组合构成，数据集中包含变量，维度，坐标和属性。

Dataarray数据数组，是组成Dataset的基础。

DataArray

一、创建DataArray

xr.DataArray(data, dims, coords, attrs, name)

# 其中的data是多维数组，用于存放DataArray的值，dims是每个轴的维度的名称用列表包含，coords是坐标用列表包含，attrs是给添加属性的字典，name是命名实例用的。

特别提醒：DataArray中只有data参数是必须的，其他都可以不要，如果只输入data参数，那么coordinates中会说dim_N即维度不存在。
import xarray as xr
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
locs = ["level", "latitude", "longitude"]
times = pd.date_range("2000-01-01", periods=4)
foo1 = xr.DataArray(data, coords=[times, locs], dims=["time", "space"], name="tea")
foo2 = xr.DataArray(data)
print(foo1)
print(foo2)


output:
<xarray.DataArray 'tea' (time: 4, space: 3)> Size: 96B
array([[0.31743583, 0.92905186, 0.81402472],
       [0.62443588, 0.46985903, 0.36383099],
       [0.53800429, 0.20190095, 0.16732536],
       [0.86612849, 0.9333842 , 0.0027154 ]])
Coordinates:
  * time     (time) datetime64[ns] 32B 2000-01-01 2000-01-02 ... 2000-01-04
  * space    (space) <U9 108B 'level' 'latitude' 'longitude'

<xarray.DataArray (dim_0: 4, dim_1: 3)> Size: 96B
array([[0.5889453 , 0.9485167 , 0.4536665 ],
       [0.22158899, 0.8757486 , 0.11454141],
       [0.33387963, 0.58135396, 0.65170388],
       [0.14057696, 0.39856906, 0.4988272 ]])
Dimensions without coordinates: dim_0, dim_1
二、DataArray坐标访问和修改

1.foo["time"] / foo["space"] # 这样可以访问DataArray的coordinate（坐标）

2.foo["time"] = pd.date_range("1999-01-01", periods=4) # 这样可以修改或者添加坐标

三、DataArray的一些属性

1. .values # 可以得到np.ndarray存放的数组的数值

2. .dims # 每个轴的维度的名称，如x，y，z

3. .coords # 就是返回坐标标签【print(DataArray对象)的coordinate：*time ...】

4. .attrs # 返回DataArray的属性

特别提醒：

1.可以通过例如foo.values=2 * foo.values对values进行修改

2.可以通过例如foo.attrs["units"] = "meters"来补充缺失的属性。

3.可以通过例如foo.rename("lv")来修改DataArray的名字

Dataset

xr.Dataset(data_vars, coords, attrs)

# data_vars是字典（变量名称作为键，值为DataArray或者variable）,其中的值还可以是(["", "", ""], 数组或者列表)【表示每一维度的纬度坐标名称，其中如果使用的是DataArray作为参数一定要保证变量中的坐标在coords中一定要都有，提供数组等可以coords中没有】

特别提醒：

1.坐标的参数名称Dataset中可改，但是在DataArray中改不了。

2.Dataset.变量或者Dataset["变量"]得到的都是一个DataArray对象
import xarray as xr
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
locs = ["level", "latitude", "longitude"]
times = pd.date_range("2000-01-01", periods=4)
foo1 = xr.DataArray(data, coords=[times, locs], dims=["time", "space"], name="tea")
ds = xr.Dataset({"temperature": foo1}, coords={"time": times, "locs": (["usb"], locs)})
print(ds)


output:
<xarray.Dataset> Size: 344B
Dimensions:      (time: 4, space: 3, usb: 3)
Coordinates:
  * time         (time) datetime64[ns] 32B 2000-01-01 2000-01-02 ... 2000-01-04
  * space        (space) <U9 108B 'level' 'latitude' 'longitude'
    locs         (usb) <U9 108B 'level' 'latitude' 'longitude'
Dimensions without coordinates: usb
Data variables:
    temperature  (time, space) float64 96B 0.8683 0.7089 ... 0.09413 0.08929
Dataset的更新

例子：

1.ds["temperature"] = (["x","y","temp"], temp)

2.ds.coords["lat"] = (["x", "y"], lat)

DataArray和Dataset的索引

这里通过一个我读取nc文件的例子来进行讲解：

提醒：DataArray和Dataset都可以使用where来进行筛选数据，DataArray和Dataset都可以使用sel来进行操作但是DataArray还可以进行loc的操作。除此之外，where函数筛选的数据显现没筛选到的直接赋予NaN，这和np.where还是有很大的差别的。

# -- coding:utf-8 --
import pandas as pd
import xarray as xr
data = xr.open_dataset(r"C:\Users\lvyih\Desktop\stipple.nc")
print(data.sel(latitude=slice(40, 30), longitude=slice(20, 180)))
print(data.where(data.latitude < 60, drop=True))
climatology = data.climatology
# 读取climatology的名字
print(climatology.name)
# 下面读取得到的是一个DataArray对象
print(climatology.longitude)
# isel索引(是位置索引)
print(climatology.isel(longitude=0, latitude=0))
# sel索引（是标签索引）,其中可以增加一个参数nearest表示目标最临近的点找目标值。
print(climatology.sel(longitude=0, latitude=23, method="nearest"))
# 使用标签的索引：多点索引【也就是提醒我们多个索引需要使用的是dataarray对象作为参数】
lat_points = xr.DataArray([0, 30, 60], dims="latitude")
lon_points = xr.DataArray([138, 138, 138], dims="longitude")
print(climatology.sel(latitude=lat_points, longitude=lon_points, method="nearest"))
# 使用sel区域索引：(记住这里就是使用了slice来进行区域选取，这里很方便，直接就是最近索引寻找)
print(climatology.sel(latitude=slice(40, 30), longitude=slice(20, 180)))
# 使用loc进行区域索引：[这里就类似于pandas和numpy进行索引]
print(climatology.loc[40:30, 20:180])
# 还可以使用where函数类似于numpy直接进行布尔索引操作
ceshi = climatology.loc[40:30, 130:180]
print(ceshi)
print(ceshi.where(ceshi.longitude > 175, drop=True))
# dataset对象是不支持loc的，它只支持sel的标签范围指定



output:
<xarray.Dataset> Size: 2kB
Dimensions:      (latitude: 5, longitude: 43)
Coordinates:
  * longitude    (longitude) float32 172B 22.5 26.25 30.0 ... 172.5 176.2 180.0
  * latitude     (latitude) float32 20B 40.0 37.5 35.0 32.5 30.0
Data variables:
    climatology  (latitude, longitude) float32 860B ...
    jan1963      (latitude, longitude) float32 860B ...
<xarray.Dataset> Size: 47kB
Dimensions:      (latitude: 60, longitude: 96)
Coordinates:
  * longitude    (longitude) float32 384B 0.0 3.75 7.5 ... 348.8 352.5 356.2
  * latitude     (latitude) float32 240B 57.5 55.0 52.5 ... -85.0 -87.5 -90.0
Data variables:
    climatology  (latitude, longitude) float32 23kB 5.68 4.955 ... -22.77 -22.78
    jan1963      (latitude, longitude) float32 23kB 4.252 2.848 ... -24.16
climatology
<xarray.DataArray 'longitude' (longitude: 96)> Size: 384B
array([  0.  ,   3.75,   7.5 ,  11.25,  15.  ,  18.75,  22.5 ,  26.25,  30.  ,
        33.75,  37.5 ,  41.25,  45.  ,  48.75,  52.5 ,  56.25,  60.  ,  63.75,
        67.5 ,  71.25,  75.  ,  78.75,  82.5 ,  86.25,  90.  ,  93.75,  97.5 ,
       101.25, 105.  , 108.75, 112.5 , 116.25, 120.  , 123.75, 127.5 , 131.25,
       135.  , 138.75, 142.5 , 146.25, 150.  , 153.75, 157.5 , 161.25, 165.  ,
       168.75, 172.5 , 176.25, 180.  , 183.75, 187.5 , 191.25, 195.  , 198.75,
       202.5 , 206.25, 210.  , 213.75, 217.5 , 221.25, 225.  , 228.75, 232.5 ,
       236.25, 240.  , 243.75, 247.5 , 251.25, 255.  , 258.75, 262.5 , 266.25,
       270.  , 273.75, 277.5 , 281.25, 285.  , 288.75, 292.5 , 296.25, 300.  ,
       303.75, 307.5 , 311.25, 315.  , 318.75, 322.5 , 326.25, 330.  , 333.75,
       337.5 , 341.25, 345.  , 348.75, 352.5 , 356.25], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 384B 0.0 3.75 7.5 11.25 ... 348.8 352.5 356.2
<xarray.DataArray 'climatology' ()> Size: 4B
[1 values with dtype=float32]
Coordinates:
    longitude  float32 4B 0.0
    latitude   float32 4B 90.0
<xarray.DataArray 'climatology' ()> Size: 4B
[1 values with dtype=float32]
Coordinates:
    longitude  float32 4B 0.0
    latitude   float32 4B 22.5
<xarray.DataArray 'climatology' (latitude: 3, longitude: 3)> Size: 36B
[9 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 12B 138.8 138.8 138.8
  * latitude   (latitude) float32 12B 0.0 30.0 60.0
<xarray.DataArray 'climatology' (latitude: 5, longitude: 43)> Size: 860B
[215 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 172B 22.5 26.25 30.0 ... 172.5 176.2 180.0
  * latitude   (latitude) float32 20B 40.0 37.5 35.0 32.5 30.0
<xarray.DataArray 'climatology' (latitude: 5, longitude: 43)> Size: 860B
[215 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 172B 22.5 26.25 30.0 ... 172.5 176.2 180.0
  * latitude   (latitude) float32 20B 40.0 37.5 35.0 32.5 30.0
<xarray.DataArray 'climatology' (latitude: 5, longitude: 14)> Size: 280B
[70 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 56B 131.2 135.0 138.8 ... 172.5 176.2 180.0
  * latitude   (latitude) float32 20B 40.0 37.5 35.0 32.5 30.0
<xarray.DataArray 'climatology' (latitude: 5, longitude: 2)> Size: 40B
array([[ 8.610901 ,  8.928772 ],
       [11.1875305, 11.346649 ],
       [13.661987 , 13.666504 ],
       [15.794495 , 15.681885 ],
       [17.774475 , 17.680023 ]], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 8B 176.2 180.0
  * latitude   (latitude) float32 20B 40.0 37.5 35.0 32.5 30.0