逐像元提取栅格值并绘制直方图

Aaron Hill
于 2024-07-18 21:50:32 发布
阅读量18
点赞数 1
分类专栏： Python空间分析文章标签： arcgis
本文链接：https://blog.csdn.net/alensmithing/article/details/140534040
版权
Python空间分析专栏收录该内容
2 篇文章 0 订阅
订阅专栏
from osgeo import gdal
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# 打开 tif 文件
dataset = gdal.Open(r"D:\pythonProject\python_GIS_study\VPM_EBF_2000.tif")
img_width = dataset.RasterXSize # 获取栅格列数
img_height = dataset.RasterYSize    #获取栅格行数
img_data = dataset.ReadAsArray(0, 0, img_width, img_height) # 读取栅格位数组
# img_data.shape 返回一个包含栅格的行列数的元组，img_data.shape[0]为栅格行数，img_data.shape[1]为栅格列数
print(img_data.shape)

hist = []   # 创建空列表以存储有效像素值
for i in range(img_data.shape[0]):
    for j in range(img_data.shape[1]):
        if img_data[i][j] >= 0:     # 将栅格中大于0的值（有效值）加入列表
            hist.append(img_data[i][j])

hist.sort() # 对像元值列表进行排序，list.sort(cmp=None, key=None, reverse=False) 其中reverse = True 为降序，reverse = True 为升序

hist = np.array(hist)    # 将列表转换为NumPy数组

# numpy提供的linspace()函数用于创建数值序列，np.linspace(start = , stop = , num = ) start为范围的起始点， stop为范围终止点，num控制序列中有多少个元素，为整数
bins = np.linspace(hist.min(), hist.max(), int(hist.max()-hist.min()+1))

# 绘制直方图
'''
hist的参数非常多，但常用的有以下6个，只有第一个是必须的，后面5个可选
x: 作直方图所要用的数据，必须是一维数组。多维数组可以先进行扁平化再作图
bins: 直方图的柱数，可选项，默认为10
density：如果设置为True，则表示的是概率密度而非频数，此时各个 bin 的高度总和应约为 1
facecolor: 直方图颜色
edgecolor: 直方图边框颜色
alpha: 透明度
histtype: 直方图类型，‘bar’, 'barstacked', 'step', 'stepfilled'

返回值:
n:直方图向量，是否归一化由参数normed设定。当normed取默认值时，n即为直方图各组内元素的数量（各组频数）
bins: 返回各个bin的区间范围
patches：返回每个bin里面包含的数据，是一个list
'''
plt.hist(hist, bins, facecolor='gray', density=True, histtype='stepfilled')
plt.xlabel("像元值", fontproperties='SimSun', size = 18)    # 设置X轴标签
plt.ylabel("频率", fontproperties='SimSun', size = 18)  # 设置Y轴标签
plt.yticks(fontproperties = 'Times New Roman', size = 14)
plt.xticks(fontproperties = 'Times New Roman', size = 14)
plt.title("直方图", fontproperties='SimSun', size = 18)  # 设置标题
plt.tight_layout()  # plt.tight_layout() 函数可以自动调整子图参数，使得轴标签和标题不会重叠或者超出图形边界
plt.savefig(r"D:\pythonProject\python_GIS_study\直方图.jpg")
plt.show()

# 创建DataFrame存储像素值和序号，并将其保存为CSV文件
df = pd.DataFrame({'number': range(1, len(hist) + 1), 'value': hist})
df.to_csv(r"D:\pythonProject\python_GIS_study\VPM_EBF_2000.csv", index=False, encoding='utf-8')


'''
官方参数解释
Parameters
----------
x : (n,) array or sequence of (n,) arrays
    Input values, this takes either a single array or a sequence of
    arrays which are not required to be of the same length.

bins : int or sequence or str, optional
    If an integer is given, ``bins + 1`` bin edges are calculated and
    returned, consistent with `numpy.histogram`.

    If `bins` is a sequence, gives bin edges, including left edge of
    first bin and right edge of last bin.  In this case, `bins` is
    returned unmodified.

    All but the last (righthand-most) bin is half-open.  In other
    words, if `bins` is::

        [1, 2, 3, 4]

    then the first bin is ``[1, 2)`` (including 1, but excluding 2) and
    the second ``[2, 3)``.  The last bin, however, is ``[3, 4]``, which
    *includes* 4.

    Unequally spaced bins are supported if *bins* is a sequence.

    With Numpy 1.11 or newer, you can alternatively provide a string
    describing a binning strategy, such as 'auto', 'sturges', 'fd',
    'doane', 'scott', 'rice' or 'sqrt', see
    `numpy.histogram`.

    The default is taken from :rc:`hist.bins`.

range : tuple or None, optional
    The lower and upper range of the bins. Lower and upper outliers
    are ignored. If not provided, *range* is ``(x.min(), x.max())``.
    Range has no effect if *bins* is a sequence.

    If *bins* is a sequence or *range* is specified, autoscaling
    is based on the specified bin range instead of the
    range of x.

    Default is ``None``

density : bool, optional
    If ``True``, the first element of the return tuple will
    be the counts normalized to form a probability density, i.e.,
    the area (or integral) under the histogram will sum to 1.
    This is achieved by dividing the count by the number of
    observations times the bin width and not dividing by the total
    number of observations. If *stacked* is also ``True``, the sum of
    the histograms is normalized to 1.

    Default is ``None`` for both *normed* and *density*. If either is
    set, then that value will be used. If neither are set, then the
    args will be treated as ``False``.

    If both *density* and *normed* are set an error is raised.

weights : (n, ) array_like or None, optional
    An array of weights, of the same shape as *x*.  Each value in *x*
    only contributes its associated weight towards the bin count
    (instead of 1).  If *normed* or *density* is ``True``,
    the weights are normalized, so that the integral of the density
    over the range remains 1.

    Default is ``None``.

    This parameter can be used to draw a histogram of data that has
    already been binned, e.g. using `np.histogram` (by treating each
    bin as a single point with a weight equal to its count) ::

        counts, bins = np.histogram(data)
        plt.hist(bins[:-1], bins, weights=counts)

    (or you may alternatively use `~.bar()`).

cumulative : bool, optional
    If ``True``, then a histogram is computed where each bin gives the
    counts in that bin plus all bins for smaller values. The last bin
    gives the total number of datapoints. If *normed* or *density*
    is also ``True`` then the histogram is normalized such that the
    last bin equals 1. If *cumulative* evaluates to less than 0
    (e.g., -1), the direction of accumulation is reversed.
    In this case, if *normed* and/or *density* is also ``True``, then
    the histogram is normalized such that the first bin equals 1.

    Default is ``False``

bottom : array_like, scalar, or None
    Location of the bottom baseline of each bin.  If a scalar,
    the base line for each bin is shifted by the same amount.
    If an array, each bin is shifted independently and the length
    of bottom must match the number of bins.  If None, defaults to 0.

    Default is ``None``

histtype : {'bar', 'barstacked', 'step',  'stepfilled'}, optional
    The type of histogram to draw.

    - 'bar' is a traditional bar-type histogram.  If multiple data
      are given the bars are arranged side by side.

    - 'barstacked' is a bar-type histogram where multiple
      data are stacked on top of each other.

    - 'step' generates a lineplot that is by default
      unfilled.

    - 'stepfilled' generates a lineplot that is by default
      filled.

    Default is 'bar'

align : {'left', 'mid', 'right'}, optional
    Controls how the histogram is plotted.

        - 'left': bars are centered on the left bin edges.

        - 'mid': bars are centered between the bin edges.

        - 'right': bars are centered on the right bin edges.

    Default is 'mid'

orientation : {'horizontal', 'vertical'}, optional
    If 'horizontal', `~matplotlib.pyplot.barh` will be used for
    bar-type histograms and the *bottom* kwarg will be the left edges.

rwidth : scalar or None, optional
    The relative width of the bars as a fraction of the bin width.  If
    ``None``, automatically compute the width.

    Ignored if *histtype* is 'step' or 'stepfilled'.

    Default is ``None``

log : bool, optional
    If ``True``, the histogram axis will be set to a log scale. If
    *log* is ``True`` and *x* is a 1D array, empty bins will be
    filtered out and only the non-empty ``(n, bins, patches)``
    will be returned.

    Default is ``False``

color : color or array_like of colors or None, optional
    Color spec or sequence of color specs, one per dataset.  Default
    (``None``) uses the standard line color sequence.

    Default is ``None``

label : str or None, optional
    String, or sequence of strings to match multiple datasets.  Bar
    charts yield multiple patches per dataset, but only the first gets
    the label, so that the legend command will work as expected.

    default is ``None``

stacked : bool, optional
    If ``True``, multiple data are stacked on top of each other If
    ``False`` multiple data are arranged side by side if histtype is
    'bar' or on top of each other if histtype is 'step'

    Default is ``False``

normed : bool, optional
    Deprecated; use the density keyword argument instead.

Returns
-------
n : array or list of arrays
    The values of the histogram bins. See *density* and *weights* for a
    description of the possible semantics.  If input *x* is an array,
    then this is an array of length *nbins*. If input is a sequence of
    arrays ``[data1, data2,..]``, then this is a list of arrays with
    the values of the histograms for each of the arrays in the same
    order.  The dtype of the array *n* (or of its element arrays) will
    always be float even if no weighting or normalization is used.

bins : array
    The edges of the bins. Length nbins + 1 (nbins left edges and right
    edge of last bin).  Always a single array even when multiple data
    sets are passed in.

patches : list or list of lists
    Silent list of individual patches used to create the histogram
    or list of such list if multiple input datasets.

Other Parameters
----------------
**kwargs : `~matplotlib.patches.Patch` properties

See also
--------
hist2d : 2D histograms

Notes
-----


.. note::
    In addition to the above described arguments, this function can take a
    **data** keyword argument. If such a **data** argument is given, the
    following arguments are replaced by **data[<arg>]**:

    * All arguments with the following names: 'weights', 'x'.

    Objects passed as **data** must support item access (``data[<arg>]``) and
    membership test (``<arg> in data``).


'''