rasterio Python Quickstart

来源:https://rasterio.readthedocs.io/en/latest/quickstart.html

Python Quickstart

Reading and writing data files is a spatial data programmer’s bread and butter. This document explains how to use Rasterio to read existing files and to create new files. Some advanced topics are glossed over to be covered in more detail elsewhere in Rasterio’s documentation. Only the GeoTIFF format is used here, but the examples do apply to other raster data formats. It is presumed that Rasterio has been installed.
读写数据文件是空间数据程序员的谋生之道。本文档解释了如何使用Rasterio读取现有文件和创建新文件。Rasterio的文档中的其他地方对一些高级主题进行了更详细的介绍。这里只使用GeoTIFF格式,但这些示例也适用于其他栅格数据格式。假定已经安装了Rasterio。

Opening a dataset in reading mode

Consider a GeoTIFF file named example.tif with 16-bit Landsat 8 imagery covering a part of the United States’s Colorado Plateau 1. Because the imagery is large (70 MB) and has a wide dynamic range it is difficult to display it in a browser. A rescaled and dynamically squashed version is shown below.
考虑一个名为example.tif的16位的Landsat 8影像的GeoTIFF文件,覆盖了美国科罗拉多州高原的一部分。因为图像很大(70 MB)并且具有很宽的动态范围,所以很难在浏览器中显示。下面显示了一个重新缩放和动态挤压的版本。

import rasterio

Next, open the file.

data_dir = "E:/MyStudy/segmentation_models.pytorch-master/data_li/"
fp = os.path.join(data_dir, "L2A20190813N0213R113.tif")

dataset = rasterio.open(fp)

Rasterio’s open() function takes a path string or path-like object and returns an opened dataset object. The path may point to a file of any supported raster format. Rasterio will open it using the proper GDAL format driver. Dataset objects have some of the same attributes as Python file objects.
Rasterio的open()函数接受一个路径字符串或类似路径的对象,并返回一个打开的数据集对象。该路径可以指向任何支持的栅格格式的文件。Rasterio将使用正确的GDAL格式驱动程序打开它。数据集对象与Python文件对象具有一些相同的属性。

dataset.name
# out:'D:/IrrigationLand/Sentinel_Washington/L2A20190813N0213R113/L2A20190813N0213R113.tif'
dataset.mode
# out:'r'
dataset.closed
# out:False

Dataset attributes

Properties of the raster data stored in the example GeoTIFF can be accessed through attributes of the opened dataset object. Dataset objects have bands and this example has a band count of 1.
存储在示例GeoTIFF中的栅格数据的属性可以通过打开的数据集对象的属性来访问。数据集对象有波段,本示例的波段数为1。

dataset.count
# out:3

A dataset band is an array of values representing the partial distribution of a single variable in 2-dimensional (2D) space. All band arrays of a dataset have the same number of rows and columns. The variable represented by the example dataset’s sole band is Level-1 digital numbers (DN) for the Landsat 8 Operational Land Imager (OLI) band 4 (wavelengths between 640-670 nanometers). These values can be scaled to radiance or reflectance values. The array of DN values is 7731 columns wide and 7871 rows high.
数据集波段是一个值数组,表示二维(2D)空间中单个变量的部分分布。数据集的所有波段数组都具有相同的行数和列数。
示例数据集的唯一波段表示的变量是Landsat 8 Operational Land Imager (OLI) 波段4(波长在640-670纳米之间)的1级数字(DN)。
这些值可以缩放到辐射或反射值。DN值数组宽(列数)7731列,高(行数)7871行。

dataset.width,dataset.height
# 数组的宽和高 对应数组的列数和行数。或者说就是数组的列数和行数
# out:(10980, 10980)

Some dataset attributes expose the properties of all dataset bands via a tuple of values, one per band. To get a mapping of band indexes to variable data types, apply a dictionary comprehension to the zip() product of a dataset’s indexes and dtypes attributes.
一些数据集属性通过 元组 a tuple of values公开所有数据集波段的属性,每个波段一个值。要获得波段索引到可变数据类型的映射,请对数据集indexes和dtype属性的zip()产品应用字典理解。

dataset.indexes,dataset.dtypes
# out: ((1, 2, 3), ('uint16', 'uint16', 'uint16'))
{i: dtype for i, dtype in zip(dataset.indexes, dataset.dtypes)}
# out:{1: 'uint16', 2: 'uint16', 3: 'uint16'}

The example file’s sole band contains unsigned 16-bit integer values. The GeoTIFF format also supports signed integers and floats of different size.
示例文件的唯一波段包含无符号16位整数值。GeoTIFF格式还支持不同大小的有符号整数和浮点数。

Dataset georeferencing数据集地理参考

A GIS raster dataset is different from an ordinary image; its elements (or “pixels”) are mapped to regions on the earth’s surface. Every pixels of a dataset is contained within a spatial bounding box.
GIS栅格数据集不同于普通图像;它的元素(或“像素”)被映射到地球表面的区域。 数据集的每个像素都包含在一个空间边界框中。

dataset.bounds
# out:BoundingBox(left=699960.0, bottom=5090220.0, right=809760.0, top=5200020.0)

Our example covers the world from 358485 meters (in this case) to 590415 meters, left to right, and 4028985 meters to 4265115 meters bottom to top. It covers a region 231.93 kilometers wide by 236.13 kilometers high.
我们的示例覆盖了从358485米(在本例中)到590415米(从左到右)以及4028985米到4265115米(从下到上)的世界。它占地231.93公里宽,236.13公里高。
The value of bounds attribute is derived from a more fundamental attribute: the dataset’s geospatial transform.
bounds边界属性值是从一个更基本的属性派生出来的:数据集的地理空间转换。

dataset.transform
# out:Affine(10.0, 0.0, 699960.0,
#            0.0, -10.0, 5200020.0)

# 第一行表示宽度(列)属性——空间坐标X,是栅格左上角像素的X
# 二行表示高度(行)属性——空间坐标Y,是栅格左上角像素的Y
# 我们一般用的X(数学坐标/空间坐标)对应的就是数组/栅格像素中的列/宽度(数组第二个维度),
# Y(数学坐标/空间坐标)对应的就是数组/栅格像素中的行/高度(数组第一个维度)。
# 一个栅格数据集的坐标定义为:左上角为原点。宽度为从左到右,高度为从上到下(但是上面的值大于下面的)
# 区分:pixel locations in (row, col),spatial positions(x, y)。row对应y,col对应x。
# 空间坐标是(X,Y),数组/栅格 坐标是(row,col)。X对应col,Y对应row。
# 每个仿射变换矩阵affine transformation matrix 都表示该栅格的分辨率和栅格左上角像素的空间位置。

A dataset’s transform is an affine transformation matrix that maps pixel locations in (row, col) coordinates to (x, y) spatial positions. The product of this matrix and (0, 0) , the row and column coordinates of the upper left corner of the dataset, is the spatial position of the upper left corner.
数据集的变换transform是一个仿射变换矩阵affine transformation matrix ,它将(行,列)坐标中的像素位置映射到(x,y)空间位置。该矩阵与数据集左上角的行和列坐标(0,0)的乘积是左上角的空间位置。

# dataset.transform * (row,col) = (x-col, y-row)

dataset.transform * (0, 0)
# out:(699960.0, 5200020.0)

# 总之就是dataset.transform(仿射变换矩阵affine transformation matrix)* 数据集的位置坐标(宽-col,高-row) 就得到该像素所代表的空间位置坐标。
# 这里的位置是空间位置(X宽-col,Y高-row) 
# 如果是数组/栅格 位置,则必须是中括号[],且[row,col]

# 任意位置
import random
dataset.transform * (random.randint(0,array.shape[1]),random.randint(0,array.shape[2]))
# out:(745820.0, 5140370.0)

The position of the lower right corner is obtained similarly.
右下角的位置也是类似得到的。

dataset.transform * (dataset.width, dataset.height)
# out:(809760.0, 5090220.0)

But what do these numbers mean? 4028985 meters from where? These coordinate values are relative to the origin of the dataset’s coordinate reference system (CRS).
但是这些数字意味着什么呢?离哪里4028985米?这些坐标值相对于数据集的坐标参考系(coordinate reference system(CRS))的原点。

dataset.crs
# out:CRS.from_epsg(32610)

“EPSG 32612” identifies a particular coordinate reference system: UTM zone 12N. This system is used for mapping areas in the Northern Hemisphere between 108 and 114 degrees west. The upper left corner of the example dataset, (358485.0, 4265115.0), is 141.5 kilometers west of zone 12’s central meridian (111 degrees west) and 4265 kilometers north of the equator.
EPSG 32612”标识了一个特定的坐标参考系统 coordinate reference system:UTM 12N区 UTM zone 12N。该系统用于绘制北半球西经108至114度之间的区域。示例数据集的左上角(358485.0,4265115.0)位于12区中央子午线(西经111度)以西141.5公里和赤道以北4265公里处。在crs属性和transform之间,描述了栅格数据集的地理参考,并且可以将该数据集与其他GIS数据集进行比较。
Between the crs attribute and transform the georeferencing of a raster dataset is described and the dataset can compared to other GIS datasets.
在crs属性和transform之间,描述了栅格数据集的地理参考,并且可以将该数据集与其他GIS数据集进行比较。

Reading raster data

Data from a raster band can be accessed by the band’s index number. Following the GDAL convention, bands are indexed from 1.
栅格波段的数据可以通过波段的索引号来访问。按照GDAL惯例,波段从1开始索引。

dataset.indexes
# out:(1, 2, 3)
band1 = dataset.read(1) 
print(band1.shape)
# out:(10980, 10980)
# 可以单波段读取也可以所有波段一起读取
array = dataset.read()
print(array.shape)
# out:(3, 10980, 10980)

The read() method returns a Numpy N-D array.
read()方法返回一个Numpy N-D数组。

band1

array([[ 312,  292,  310, ..., 1102, 1102, 1112],
       [ 292,  293,  310, ..., 1124, 1136, 1132],
       [ 334,  322,  307, ..., 1116, 1102, 1126],
       ...,
       [ 593,  505,  698, ...,  585,  583,  580],
       [ 637,  580,  641, ...,  611,  578,  585],
       [ 687,  491,  595, ...,  604,  571,  582]], dtype=uint16)

Values from the array can be addressed by their row, column index.
数组中的值可以通过它们的行、列索引来寻址。

band1[dataset.height // 2, dataset.width // 2]
# out:817

Spatial indexing

Datasets have an index() method for getting the array indices corresponding to points in georeferenced space. To get the value for the pixel 100 kilometers east and 50 kilometers south of the dataset’s upper left corner, do the following.
数据集有一个index()方法,用于获取与地理参考空间中的点相对应的数组索引。就是,用空间坐标(x,y),来获取数组中对应位置[row,col]用空间坐标来获取数组位置要获取数据集左上角以东100公里和以南50公里处的像素值,请执行以下操作。

x, y = (dataset.bounds.left + 100000, dataset.bounds.top - 50000)
row, col = dataset.index(x, y)
print(row, col)
print(band1[row, col])
# out:5000 10000
# out:249

To get the spatial coordinates of a pixel, use the dataset’s xy() method. The coordinates of the center of the image can be computed like this.
要获取像素的空间坐标,请使用数据集的xy()方法。图像中心的坐标可以这样计算。用栅格对象的(宽-col,高-row)来获取空间坐标

dataset.xy(dataset.height // 2, dataset.width // 2)
# out:(754865.0, 5145115.0)

Creating data

Reading data is only half the story. Using Rasterio dataset objects, arrays of values can be written to a raster data file and thus shared with other GIS applications such as QGIS.
读取数据只是故事的一半。使用Rasterio数据集对象,值数组可以写入栅格数据文件,从而与其他 GIS 应用程序(如QGIS)共享。
As an example, consider an array of floating point values representing, e.g., a temperature or pressure anomaly field measured or modeled on a regular grid, 240 columns by 180 rows. The first and last grid points on the horizontal axis are located at 4.0 degrees west and 4.0 degrees east longitude, the first and last grid points on the vertical axis are located at 3 degrees south and 3 degrees north latitude.
作为示例,考虑代表例如在规则网格上测量或建模的温度或压力异常场的浮点值数组,240列×180行。横轴上的第一个和最后一个网格点位于西经4.0度和东经4.0度,纵轴上的第一个和最后一个网格点位于南纬3度和北纬3度。

import numpy as np
x = np.linspace(-4.0, 4.0, 240)
y = np.linspace(-3.0, 3.0, 180)
X, Y = np.meshgrid(x, y)
Z1 = np.exp(-2 * np.log(2) * ((X - 0.5) ** 2 + (Y - 0.5) ** 2) / 1 ** 2)
Z2 = np.exp(-3 * np.log(2) * ((X + 0.5) ** 2 + (Y + 0.5) ** 2) / 2.5 ** 2)
Z = 10.0 * (Z2 - Z1)

print(x.shape)
print(y.shape)
# out:(240,)
# out:(180,)
print(X.shape)
print(Y.shape)
# out:(180, 240)
# out:(180, 240)
print(Z1.shape)
# out:(180, 240)
print(Z.shape)
# out:(180, 240)

The fictional field for this example consists of the difference of two Gaussian distributions and is represented by the array Z. Its contours are shown below.
本例中的虚拟场由两个高斯分布的差值组成,由数组Z表示。其轮廓如下所示。

Opening a dataset in writing mode

To save this array along with georeferencing information to a new raster data file, call rasterio.open() with a path to the new file to be created, ‘w’ to specify writing mode, and several keyword arguments.
要将此数组与地理参考信息一起保存到新的栅格数据文件中,请调用rasterio.open(),其中包含要创建的新文件的路径、指定写入模式的“w”以及几个关键字参数。

  • driver: the name of the desired format driver 所需格式驱动程序的名称
  • width: the number of columns of the dataset 数据集的列数
  • height: the number of rows of the dataset 数据集的行数
  • count: a count of the dataset bands 数据集波段的计数
  • dtype: the data type of the dataset 数据集的数据类型
  • crs: a coordinate reference system identifier or description 坐标参考系统标识符或描述
  • transform: an affine transformation matrix 仿射变换矩阵
  • nodata: a “nodata” value

The first 5 of these keyword arguments parametrize fixed, format-specific properties of the data file and are required when opening a file to write. The last 3 are optional.
这些关键字参数的前5个 参数化了数据文件的固定的、格式特定的属性,并且在打开文件进行写入时是必需的。最后3个是可选的。
In this example the coordinate reference system will be ‘+proj=latlong’, which describes an equirectangular coordinate reference system with units of decimal degrees. The proper affine transformation matrix can be computed from the matrix product of a translation and a scaling.
在本例中,坐标参考系统将是“+proj=latlong”,它描述了一个十进制单位的等矩形坐标参考系统。合适的仿射变换矩阵可以由平移和缩放的矩阵乘积来计算。

from rasterio.transform import Affine
res = (x[-1] - x[0]) / 240.0
transform = Affine.translation(x[0] - res / 2, y[0] - res / 2) * Affine.scale(res, res)
transform

The upper left point in the example grid is at 3 degrees west and 2 degrees north. The raster pixel centered on this grid point extends res / 2 , or 1/60 degrees, in each direction, hence the shift in the expression above.
示例网格中的左上角点位于西经3度和北纬2度。以该网格点为中心的栅格像素在每个方向上延伸res / 2或1/60度,因此出现了上述表达式中的偏移。
A dataset for storing the example grid is opened like so
用于存储示例网格的数据集是这样打开的

new_dataset = rasterio.open(
     './new.tif',
     'w',
     driver='GTiff',
     height=Z.shape[0],
     width=Z.shape[1],
     count=1,
     dtype=Z.dtype,
     crs='+proj=latlong',
     transform=transform, )

Values for the height, width, and dtype keyword arguments are taken directly from attributes of the 2-D array, Z. Not all raster formats can support the 64-bit float values in Z, but the GeoTIFF format can.
高度、宽度和数据类型关键字参数的值直接取自二维数组Z的属性。并非所有栅格格式都支持Z中的64位浮点值,但GeoTIFF格式可以。

Saving raster data

To copy the grid to the opened dataset, call the new dataset’s write() method with the grid and target band number as arguments.
要将网格复制到打开的数据集,请使用网格和目标波段数目(the grid and target band number)作为参数调用新数据集的write()方法。

new_dataset.write(Z, 1)

Then call the close() method to sync data to disk and finish.
然后调用close()方法将数据同步到磁盘并完成。

new_dataset.close()

Because Rasterio’s dataset objects mimic Python’s file objects and implement Python’s context manager protocol, it is possible to do the following instead.
因为Rasterio的数据集对象模仿Python的文件对象并实现Python的上下文管理器协议,所以可以改为执行以下操作。

with rasterio.open(
    './new_1.tif',
    'w',
    driver='GTiff',
    height=Z.shape[0],
    width=Z.shape[1],
    count=3,
    dtype=Z.dtype,
    crs='+proj=latlong',
    transform=transform,
) as dst:
    dst.write(Z, 3)

These are the basics of reading and writing raster data files. More features and examples are contained in the advanced topics section.
这些是读写栅格数据文件的基础。更多功能和示例包含在高级主题部分。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值