预测验证方法-第一部分

最新推荐文章于 2022-11-12 15:55:32 发布

遥远的星辰

最新推荐文章于 2022-11-12 15:55:32 发布

阅读量2.2k

点赞数

分类专栏：学习文章标签：预测预测验证

学习专栏收录该内容

4 篇文章

订阅专栏

http://www.cawcr.gov.au/projects/verification/?spm=5176.11409106.555.15.4a101e8b5HOXT5#Methods_for_spatial_forecasts
本文为以上链接网页的翻译，用于个人学习研究。如有不妥之处请指正。
Introduction
介绍
This web site:
本网站
Describes methods for forecast verification, including their characteristics, pros and cons. The methods range from simple traditional statistics and scores, to methods for more detailed diagnostic and scientific verification.
描述了预测验证的方法，包括它们的特点、利弊。这些方法包含了从简单的传统统计和评分到更详细的诊断和科学验证的方法。
Gives examples for each method, with links and references for further information. The examples are all drawn from the meteorological world (since the people creating this web site are themselves meteorologists or work with meteorologists), but the verification methods can easily be applied in other fields. They are appropriate for verifying estimates as well as forecasts.
给出了每种方法的示例，包括用于进一步链接和引用的信息。这些例子都来自气象领域（因为创建这个网站的人本身就是气象学家或与气象学家一起工作），但是验证方法可以容易地应用于其他领域。它们适合于验证估计和预测。
Demonstrates the verification techniques on a handful of forecast examples. These data will be available for download if you want to try out some of the techniques.
演示了一些预测实例上的验证技术。如果你想尝试的话，这些数据也可下载。
Does not provide source code (sorry, but what language would we use?). However, the simple methods are relatively easy to code, and the complex ones give references to people who have developed them or are working on them.
不提供源代码（抱歉，但是我们会使用什么语言？）但是，对于开发过或正在开发它们的人来说，那些简单的方法相对容易编程，复杂的方法也为他们提供参考。
Is a dynamic site - please contribute your ideas and verification methods, also suggestions for making the site better.
是一个动态的站点-请贡献你的想法和验证方法，同时请给予你的建议使本网站更好。
Issues
问题
What is forecast verification?
什么是预测验证？
If we take the term forecast to mean a prediction of the future state (of the weather, stock market prices, or whatever), then forecast verification is the process of assessing the quality of a forecast.
如果我们认为术语“预测”意味着对未来状态的预测（天气、股票市场价格等），那么预测验证就是评估预测质量的过程。
The forecast is compared, or verified, against a corresponding observation of what actually occurred, or some good estimate of the true outcome. The verification can be qualitative (“does it look right?”) or quantitative (“how accurate was it?”). In either case it should give you information about the nature of the forecast errors.
预测会与实际发生的观测值或真实值的良好估计进行比较，或验证。验证可以是定性的（“它看起来正确吗？”）也可以是定量（“它有多精确？”）。在任何一种情况下，验证都应当告诉你预测误差的性质。
Why verify?
为什么要验证？
A forecast is like an experiment – given a set of conditions, you make a hypothesis that a certain outcome will occur. You wouldn’t consider an experiment to be complete until you determined its outcome. In the same way, you shouldn’t consider a forecast experiment to be complete until you find out whether the forecast was successful.
一个预测就像一个实验——给定一组条件，你作出一个假设：某个特定的结果将会发生。直到你确定该结果是否出现，否则你当然不会认为这个实验已经完成。同样，在您确定预测是否成功之前，您不应该认为该预测实验已经做完了。
The three most important reasons to verify forecasts are:
要验证预测的三个最重要的原因是：
to monitor forecast quality - the first step toward getting better is discovering what you’re doing wrong.
监测预测质量–迈向更好的第一步是发现你做错了什么。
to improve forecast quality - how accurate are the forecasts and are they improving over time?
提高预测质量–预测有多准确，它们是否随着时间的推移而改善？
to compare the quality of different forecast systems - to what extent does one forecast system give better forecasts than another, and in what ways is that system better?
比较不同预报系统的质量–一个预报系统在多大程度上比另一个预报系统提供更好的预报，以及该系统在什么方面更好？
Types of forecasts and verifications
预测和验证的类型
There are many types of forecasts, each of which calls for slightly different methods of verification. The table below lists one way of distinguishing forecasts, along with verification methods that are appropriate for that type of forecast. David Stephenson has proposed a classification scheme for forecasts. It is often possible to convert from one type of forecast to another simply by rearranging, categorizing, or thresholding the data.
有多种类型的预测，每一种都需要稍微不同的验证方法。下表列出了区分预测的一种方法，以及适合这种预测的验证方法。David Stephenson提出了一个预测分类方案。通常可以通过简单地对数据进行重新排列、分类或阈值化来从一种类型的预测转换为另一种类型的预测。
Nature of forecast:
预测的性质 Example(s)
实例 Verification methods
验证方法
Deterministic 确定性 (non-probabilistic) （非概率） quantitative precipitation forecast
降水定量预测 visual, dichotomous, multi-category, continuous, spatial
可视化、二分类、多类、连续、空间
Probabilistic 概率 probability of precipitation, ensemble forecast
降水概率，集合预报 visual, probabilistic, ensemble
可视化，概率，集合
qualitative (worded) 定性（措辞） 5-day outlook
5天展望 visual, dichotomous, multi-category
可视化，二分类，多类
Space-time domain:
时空域
time series
时间序列 daily maximum temperature forecasts for a city
城市日最高温度预测 visual, dichotomous, multi-category, continuous,probabilistic
可视化、二分类、多类、连续、概率
spatial distribution
空间分布 map of geopotential height, rainfall chart
位势高度分布，雨量图表 visual, dichotomous, multi-category, continuous, probabilistic, spatial, ensemble
可视化、二分类、多类、连续、概率、空间、集合
pooled space and time
时空分布 monthly average global temperature anomaly
全球温度异常月平均 dichotomous, multi-category, continuous, probabilistic, ensemble
二分类、多类、连续、概率、集合
Specificity of forecast:
预测的特殊性
dichotomous (yes/no)
二分类（是/否） occurrence of fog
雾事件 visual, dichotomous, probabilistic, spatial, ensemble
可视化、二分、概率、空间、集合
multi-category
多类别 cold, normal, or warm conditions
冷、正常或暖 visual, multi-category, probabilistic, spatial, ensemble
可视化、多类别、概率、空间、集合
Continuous
连续 maximum temperature
最高温度 visual, continuous, probabilistic, spatial, ensemble
可视化、连续、概率、空间、集合
object- or event-oriented
面向对象或事件 tropical cyclone motion and intensity
热带气旋运动和强度 visual, dichotomous, multi-category, continuous, probabilistic, spatial
可视化，二分类，多类，连续，概率，空间

What makes a forecast “good”?
什么样的预测是“好”预测？
Allan Murphy, a pioneer in the field of forecast verification, wrote an essay on what makes a forecast “good” (Murphy, 1993). He distinguished three types of “goodness”:
Allan Murphy是预测验证领域的先驱，他写了一篇文章论证什么样的预测是“好”预测（墨菲，1993）。他区分了三种类型的“好”：
Consistency - the degree to which the forecast corresponds to the forecaster’s best judgement about the situation, based upon his/her knowledge base
一致性–衡量预测与预测者对情况的最佳判断相对应的程度，是以预测者的知识为基础（预测在多大程度上符合预测者根据其知识基础对情况的最佳判断）
Quality - the degree to which the forecast corresponds to what actually happened
质量—衡量预测与实际情况相符的程度
Value - the degree to which the forecast helps a decision maker to realize some incremental economic and/or other benefit
价值-衡量预测帮助决策者实现某种增量经济和/或其他利益的程度
Since we’re interested in forecast verification, let’s look a bit closer at the forecast quality. Murphy described nine aspects (called “attributes”) that contribute to the quality of a forecast. These are:
由于我们对预测验证很感兴趣，让我们来看看预测的质量。墨菲描述了九个方面（称为“属性”），这些方面对预测的质量做出了贡献。它们是：
Bias - the correspondence between the mean forecast and mean observation.
偏差——平均预测和平均观测之间的对应关系。
Association - the strength of the linear relationship between the forecasts and observations (for example, the correlation coefficient measures this linear relationship)
关联性——预测和观测之间的线性关系的强度（例如，相关系数度量这种线性关系）
Accuracy - the level of agreement between the forecast and the truth (as represented by observations). The difference between the forecast and the observation is the error. The lower the errors, the greater the accuracy.
准确性——预测与真值(以观测值替代)之间的一致程度。预测和观测的差值为误差。误差越低，准确性就越高。
Skill - the relative accuracy of the forecast over some reference forecast. The reference forecast is generally an unskilled forecast such as random chance, persistence (defined as the most recent set of observations, “persistence” implies no change in condition), or climatology. Skill refers to the increase in accuracy due purely to the “smarts” of the forecast system. Weather forecasts may be more accurate simply because the weather is easier to forecast – skill takes this into account.
技巧性——与参考预测相比，预测的相对准确性。参考预测通常是一种非熟练的预测，如随机概率、持久性（定义为最近的一组观测结果，“持久性”不意味着条件的改变）或气候学。技能指的是纯粹由于预报系统的“聪明”而增加的准确性。天气预报可能更准确，只是因为天气更容易预测——技能考虑了这一点。
Reliability - the average agreement between the forecast values and the observed values. If all forecasts are considered together, then the overall reliability is the same as the bias. If the forecasts are stratified into different ranges or categories, then the reliability is the same as the conditional bias, i.e., it has a different value for each category.
可靠性——-预测值与观测值之间的平均一致性。如果所有的预测一起考虑，那么整体的可靠性与偏差是相同的。如果将预测分层为不同的范围或类别，那么可靠性与条件偏差相同，即，对于每个类别具有不同的值。
Resolution - the ability of the forecast to sort or resolve the set of events into subsets with different frequency distributions. This means that the distribution of outcomes when “A” was forecast is different from the distribution of outcomes when “B” is forecast. Even if the forecasts are wrong, the forecast system has resolution if it can successfully separate one type of outcome from another.
分辨率——预测将事件集合排序或解析为具有不同频率分布的子集的能力。这意味着预测”A”时的结果分布与预测”B”时的结果分布不同。即使预测是错误的，但如果预测系统能够成功地将一种结果与另一种结果分离，那么它就具有分辨率。
Sharpness - the tendency of the forecast to predict extreme values. To use a counter-example, a forecast of “climatology” has no sharpness. Sharpness is a property of the forecast only, and like resolution, a forecast can have this attribute even if it’s wrong (in this case it would have poor reliability).
锐度——预测极端值的趋势。用一个反例来说，“气候学”的预测并不尖锐。锐度只是预测的一个属性，和分辨率一样，一个预测可以有这个属性，即使它是错误的（在这种情况下，它将有较差的可靠性）。
Discrimination - ability of the forecast to discriminate among observations, that is, to have a higher prediction frequency for an outcome whenever that outcome occurs.
区分——预测在观测结果之间进行区分的能力，即在结果发生时对结果有较高的预测频率。
Uncertainty - the variability of the observations. The greater the uncertainty, the more difficult the forecast will tend to be.
不确定性——观测结果的可变性。不确定性越大，预测就越难。
Traditionally, forecast verification has emphasized accuracy and skill. It’s important to note that the other attributes of forecast performance also have a strong influence on the value of the forecast.
传统上，预测验证强调准确性和技巧性。值得注意的是，预测性能的其他属性也对预测值有很强的影响。
Forecast quality vs. value
预测质量与价值
Forecast quality is not the same as forecast value. A forecast has high quality if it predicts the observed conditions well according to some objective or subjective criteria. It has value if it helps the user to make a better decision.
预测质量与预测价值不同。如果根据某种客观或主观的标准对观测到的条件进行良好的预测，则预测具有很高的质量。如果预测能帮助用户做出更好的决定，预测就有价值。
Imagine a situation in which a high resolution numerical weather prediction model predicts the development of isolated thunderstorms in a particular region, and thunderstorms are indeed observed in the region but not in the particular spots suggested by the model. According to most standard verification measures this forecast would have poor quality, yet it might be very valuable to the forecaster in issuing a public weather forecast.
设想一个高分辨率的数值天气预报模型预测特定区域的孤立雷暴的发展情况，并且雷暴确实在该区域内观察到，但不是在模型所建议的特定地点。根据大多数标准的验证来衡量，这次预测的质量不佳（很差），但是它对预报员发布公共天气预报很有价值。
An example of a forecast with high quality but little value is a forecast of clear skies over the Sahara Desert during the dry season.
一个高质量但价值很小的预测例子是干旱季节撒哈拉沙漠上空晴空预报。
When the cost of a missed event is high, the deliberate over-forecasting of a rare event may be justified, even though a large number of false alarms may also result. An example of such a circumstance is the occurrence of fog at airports. In this case quadratic scoring rules (those involving squared errors) will tend to penalize such forecasts harshly, and a positively oriented score such as “hit rate” may be more useful.
当漏报事件的成本很高时，对罕见事件的慎重高估可能是合理的，尽管可能会造成大量的虚虚报。这种情况的一个例子是机场的雾事件。在这种情况下，二次评分规则（那些涉及平方误差的规则）将倾向于严厉地惩罚这种预测，而正面的评分，如“命中率”，可能更有用。
Katz and Murphy (1997), Thornes and Stephenson (2001) and Wilks (2001) describe methods for assessing the value of weather forecasts. The relative value plot is sometimes used as a verification diagnostic.
卡茨和墨菲（1997年），索恩和斯蒂芬森（2001年）和Wilks（2001年）描述了评估天气预报价值的方法。相关的价值图有时被用作验证诊断。
What is “truth” when verifying a forecast?
验证预测时“真值”是什么？
The “truth” data that we use to verify a forecasts generally comes from observational data. These could be rain gauge measurements, temperature observations, satellite-derived cloud cover, geopotential height analyses, and so on.
我们用来验证预测的“真实”数据通常来自观测数据。这些可以是雨量计测量、温度观测、卫星云层覆盖、位势高度分析等等。
In many cases it is difficult to know the exact truth because there are errors in the observations. Sources of uncertainty include random and bias errors in the measurements themselves, sampling error and other errors of representativeness, and analysis error when the observational data are analyzed or otherwise altered to match the scale of the forecast.
在许多情况下，很难知道确切的真值，因为在观测中存在误差。不确定性的来源包括测量本身的随机误差和偏置误差、采样误差和其他具有代表性的误差，以及在分析或以其他方式改变观测数据以匹配预测规模时的分析误差。
Rightly or wrongly, most of the time we ignore the errors in the observational data. We can get away with this if the errors in the observations are much smaller than the expected error in the forecast (high signal to noise ratio). Even skewed or under-sampled verification data can give us a good idea of which forecast products are better than others when intercomparing different forecast methods. Methods to account for errors in the verification data currently being researched.
不管是对是错，大多数时候我们忽略了观测数据中的误差。如果观测误差远小于比预测的预期误差（高信噪比），我们就可以不受观测误差的影响。即使是那些偏斜或抽样不足的验证数据，也可以让我们清楚地知道，在相互比较不同的预报方法时，哪些预报产品比其他预报产品更好。目前正在研究用于解释验证数据中误差的方法。
Validity of verification results
验证结果的有效性
The verification results are naturally more trustworthy when the quantity and quality of the verification data are high. It is always a good idea to put some error bounds on the verification results themselves. It is especially important (a) for rare events where the sample size is typically small, (b) when the data shows a lot of variability, and (c)when you want to know whether one forecast product is significantly better (in a statistical sense) than another.
当验证数据的数量和质量高时，验证结果自然更可信。在验证结果本身上设置一些误差界限总是一个好主意。（a）对于样本量通常较小的罕见事件，(b)当数据显示出许多可变性时，(c)当您想知道一个预测产品是否显著优于另一个预测产品时(在统计意义上)，这一点尤其重要。
The usual approach is to determine confidence intervals for the verification scores using analytic, approximate, or bootstrapping methods (depending on the score). Some good meteorological references on this subject are Seaman et al. (1996), Wilks (2011, ch.5), Hamill (1999), and Kane and Brown (2000).
通常的方法是使用分析、近似或自举方法（取决于评分）确定验证分数的置信区间。关于这一主题良好的气象参考资料有Seaman等（1996），威尔克斯（2011，Ch.5），哈米尔（1999），凯恩和布朗（2000）。
Pooling versus stratifying results
汇集vs分层结果
To get reliable verification statistics, a large number of forecast/observations pairs (samples) may be pooled over time and/or space. The larger the number of samples, the more reliable the verification results. You can also get pooled results by aggregating verification statistics over a longer time period, but be careful to handle non-linear scores properly.
为了获得可靠的验证统计数字，可以随时间和/或空间汇集大量的预测/观测样本对(样本)。样本数量越大，验证结果越可靠。您还可以通过长时间汇集验证统计数据来获得汇总结果，但是要小心恰当地处理非线性评分。
The danger with pooling samples, however, is that it can mask variations in forecast performance when the data are not homogeneous. It can bias the results toward the most commonly sampled regime (for example, regions with higher station density, or days with no severe weather). Non-homegeneous samples can lead to overestimates of forecast skill using some commonly used metrics - Hamill and Juras (2005) provide some clear examples of how this can occur.
然而，汇集样本的危险在于，当数据不均匀时，它可以掩盖预测性能的变化。它可以将结果偏向于最常出现的采样区（例如，站点密度较高的区域，或者没有恶劣天气的天数）。非同源样本可能导致使用一些常用指标对预测技能进行过高估计——Hamill和Juras(2005)提供了一些关于这种情况如何发生的明确例子。
Stratifying the samples into quasi-homogeneous subsets (by season, by geographical region, by intensity of the observations, etc.) helps to tease out forecast behavior in particular regimes. When doing this, be sure that the subsets contain enough samples to give trustworthy verification results.
将样本分层为准齐次子集（按季节、按地理区域、按观测强度等）有助于梳理特定范围内的预测行为。当这样做时，要确保子集包含足够的样本以便给出可信的验证结果。