为改善地球和环境数据质量保证而进行的状态(state)标记。

【本文尝试使用state这个开放性的标签来增加监测数据的情境资讯】

State Tagging for Improved Earth and Environmental Data Quality Assurance.

环境数据使我们能够监测我们所生活的不断变化的环境。

它允许我们研究趋势并帮助我们开发更好的模型来描述我们环境中的过程,反过来,它们可以提供信息来改进管理实践。

为了确保数据在分析和解释时是可靠的,它们必须经过质量保证程序。

这些程序一般包括采样和实验室测量(如果适用)期间的标准操作程序,以及进入数据库时的数据验证。

后者通常涉及遵从性(即,格式)和一致性(即,值)最可能以单参数范围测试的形式进行的检查。

这种测试不考虑每次测量所处的系统状态,并且很少向用户提供关于可能导致测量值超出范围的情境资讯。

我们建议使用数据科学技术将每个测量标记为一个确定的系统状态。

这里的术语“状态”定义得很松散,它们是通过k-means聚类(一种无监督机器学习方法)来识别的。

state的定义向专家们开放。

一旦确定了state,就可以计算出每个观测变量的状态依赖预测区间。

这种方法为用户提供了更多的情境资讯,以解析超出范围的标志,并获取考虑系统状态变化的观测变量的预测间隔。

然后用户可以根据需要应用进一步的分析和过滤。

我们用英国两个完善的长期监测数据集来说明我们的方法——英国环境变化网络(ECN)的蛾和蝴蝶数据,以及英国CEH坎布里亚湖监测计划。

我们的工作有助于正在进行的更好的数据科学框架的开发,使研究人员和其他利益相关者能够更容易地发现和使用他们需要的数据。

Environmental data allows us to monitor the constantly changing environment that we live in. It allows us to study trends and helps us to develop better models to describe processes in our environment and they, in turn, can provide information to improve management practices. To ensure that the data are reliable for analysis and interpretation, they must undergo quality assurance procedures. Such procedures generally include standard operating procedures during sampling and laboratory measurement (if applicable), as well as data validation upon entry to databases. The latter usually involves compliance (i.e., format) and conformity (i.e., value) checks that are most likely to be in the form of single parameter range tests. Such tests take no consideration of the system state at which each measurement is made, and provide the user with little contextual information on the probable cause for a measurement to be flagged out of range. We propose the use of data science techniques to tag each measurement with an identified system state. The term “state” here is defined loosely and they are identified using k-means clustering, an unsupervised machine learning method. The meaning of the states is open to specialist interpretation. Once the states are identified, state-dependent prediction intervals can be calculated for each observational variable. This approach provides the user with more contextual information to resolve out-of-range flags and derive prediction intervals for observational variables that considers the changes in system states. The users can then apply further analysis and filtering as they see fit. We illustrate our approach with two well-established long-term monitoring datasets in the UK: moth and butterfly data from the UK Environmental Change Network (ECN), and the UK CEH Cumbrian Lakes monitoring scheme. Our work contributes to the ongoing development of a better data science framework that allows researchers and other stakeholders to find and use the data they need more readily.

[1]. Tso, C.M., et al., State Tagging for Improved Earth and Environmental Data Quality Assurance. Frontiers in Environmental Science, 2020. 8: p. 46. 【sci未收录】

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

balabalahoo

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值