spacenet_SpaceNet更改和对象跟踪(SCOT)指标

spacenet

Daniel Hogan and Adam Van Etten

Daniel HoganAdam Van Etten

Preface: SpaceNet LLC is a nonprofit organization dedicated to accelerating open source, artificial intelligence applied research for geospatial applications, specifically foundational mapping (i.e. building footprint and road network detection). SpaceNet is solely managed by co-founder, In-Q-Tel CosmiQ Works, in collaboration with co-founder and co-chair, Maxar Technologies, and the other partners: Amazon Web Services (AWS), Capella Space, Topcoder, Institute of Electrical and Electronics Engineers (IEEE) Geoscience and Remote Sensing Society (GRSS), the National Geospatial-Intelligence Agency (NGA) and Planet.

前言: SpaceNet LLC 是一个非营利性组织,致力于加速针对地理空间应用的开放源代码,人工智能应用研究,特别是基础地图(即建筑物占地面积和道路网络检测)。 SpaceNet仅由共同创办人管理, 在-Q-Tel公司 CosmiQ工程 ,与联合创始人和联合主席,协作 Maxar技术 ,以及其他合作伙伴: 亚马逊网络服务(AWS) 卡佩拉空间 TopCoder公司 研究所电气和电子工程师协会(IEEE)地球科学与遥感学会(GRSS), 国家地理空间情报局 (NGA) 行星

The SpaceNet 7 Multi-Temporal Urban Development Challenge has the ambitious goal of tracking precise building addresses and urban change from satellite imagery. As detailed in our announcement blog, the goal of SpaceNet 7 is relevant to numerous human development and disaster response applications. Furthermore, the unique SpaceNet 7 dataset poses a challenge from a computer vision standpoint because of the small pixel area of each object, the high object density within images, and the dramatic image-to-image difference compared to frame-to-frame variation in video object tracking.

SpaceNet 7多时相城市发展挑战赛的宏伟目标是通过卫星图像追踪精确的建筑地址和城市变化。 如我们的公告博客中所述,SpaceNet 7的目标与众多人类发展和灾难响应应用程序相关。 此外,独特的SpaceNet 7数据集从计算机视觉的角度提出了挑战,因为每个对象的像素面积小,图像中的对象密度高,并且与帧间差异相比,图像间差异很大。视频对象跟踪。

How exactly one should measure performance for the SpaceNet 7 task is a tricky question given the multiple dimensions of the dataset. For one, each building footprint is assigned a unique identifier (i.e. address), which we would like to track over time. Secondly, there is significant construction activity in the SpaceNet 7 data cubes, with new buildings appearing (or disappearing) throughout the time series. Ideally, we would like to quantify the ability of machine learning algorithms to correctly predict when and where the change takes place. These two competing priorities (tracking and change) are not fully captured in existing metrics, and lead us to develop a new metric: the SpaceNet Change and Object Tracking (SCOT) metric. This blog details the advantages and specifics of this new metric that will be used to score the impending SpaceNet 7 challenge.

考虑到数据集的多个维度,如何准确地衡量SpaceNet 7任务的性能是一个棘手的问题。 首先,为每个建筑物的占地面积分配一个唯一的标识符(即地址),我们希望随着时间的推移进行跟踪。 其次,SpaceNet 7数据立方体中有大量的建筑活动,整个时间序列中都有新建筑物出现(或消失)。 理想情况下,我们希望量化机器学习算法正确预测何时何地发生更改的能力。 现有度量未完全捕获这两个相互竞争的优先级(跟踪和更改),并导致我们开发新的度量: SpaceNet更改和对象跟踪(SCOT)度量 。 该博客详细介绍了此新指标的优势和特点,该指标将用于应对即将到来的SpaceNet 7挑战。

1. SCOT指标 (1. The SCOT Metric)

For SpaceNet 7, the ground truth and the model-generated proposals both consist of a set of building footprints for each month. Each footprint is assigned an ID number, with the idea being that each reappearance of the same ID in subsequent months corresponds to a new observation of the same building. Measuring the concordance between the ground truth and proposals is the job for an evaluation metric.

对于SpaceNet 7,地面实况和模型生成的建议都包含一组每月的建筑足迹。 每个覆盖区都分配有一个ID号,其想法是在随后的几个月中,相同ID的每次重新出现都对应于对同一建筑物的新观测。 衡量基本事实与提议之间的一致性是评估指标的工作。

We originally intended to use the Multiple Object Tracking Accuracy (MOTA) metric for SpaceNet 7, which is a commonly used metric for object tracking in video. Yet this metric is not a good choice for difficult sequences, since errors are compounded additively (rather than being “averaged” out via a harmonic mean with the historical SpaceNet metric). The end result is that for difficult scenes with few true positives and many false positives, it is possible to achieve a negative MOTA score. Since we are tackling a very hard problem in SpaceNet 7, MOTA is therefore a poor choice. In fact, all existing metrics that we investigated unfortunately proved to be a poor fit for our dataset and challenge task, leading us to develop our own metric.

我们最初打算对SpaceNet 7使用多对象跟踪准确性 (MOTA)度量标准,它是视频中对象跟踪的常用度量标准。 然而,对于困难的序列,此度量标准不是一个好的选择,因为误差会累加加总(而不是通过与历史SpaceNet度量标准的谐波平均值“平均”出来)。 最终结果是,对于困难场景,只有很少的真实正数和很多错误的正数,有可能获得负的MOTA分数。 由于我们正在解决SpaceNet 7中的一个非常棘手的问题,因此MOTA并不是一个好的选择。 实际上,不幸的是,我们调查的所有现有指标都证明不适合我们的数据集和挑战任务,导致我们开发了自己的指标。

The SpaceNet Change and Object Tracking (SCOT) metric combines two terms: a tracking term and a change detection term. The tracking term evaluates how often the proposal correctly tracks the same buildings from month to month with consistent ID numbers. In other words, it measures the model’s ability to characterize what stays the same as time goes by. The change detection term evaluates how often the proposal correctly picks up on the construction of new buildings. In other words, it measures the model’s ability to characterize what changes as time goes by.

SpaceNet更改和对象跟踪(SCOT)度量标准结合了两个术语:跟踪术语和更改检测术语。 跟踪条件会评估提案每月使用一致的ID号正确跟踪相同建筑物的频率。 换句话说,它衡量了模型表征随着时间流逝保持不变的能力。 变更检测术语可评估提案在新建筑物的建造中正确采用的频率。 换句话说,它衡量了模型表征随时间变化的能力。

1.1匹配的足迹 (1.1 Matching Footprints)

For both terms (tracking and change detection) in the SCOT metric, we start by finding “matches” between ground truth and proposal footprints for each month, just like in the original SpaceNet Metric used for previous SpaceNet building footprint challenges. To be matched, a ground truth footprint and a proposal footprint must have an intersection-over-union (IOU) exceeding a given threshold. In previous challenges, a threshold of 0.5 was used, but for SpaceNet 7 we relax the threshold to 0.25 due to the extra difficulty of interpreting lower-resolution imagery. Using a threshold below 0.5 makes it possible for more than one proposal footprint to qualify to be matched to the same ground truth footprint, or vice versa. To resolve that ambiguity, the SCOT metric introduces an optimized way of selecting matches when such choices arise. The metric uses the set of matches that minimizes the number of unmatched ground truth footprints plus the number of unmatched proposal footprints. If there’s more than one way to achieve that goal, then as a tiebreaker the set of matches with the highest sum of IOUs is selected. This is an example of a long-studied problem in combinatorics (the “unbalanced linear assignment problem”), and an algorithmic solution is available in SciPy and other places.

对于SCOT指标中的两个术语(跟踪和变更检测),我们首先要找到每个月的地面实况和建议足迹之间的“匹配”,就像在以前的SpaceNet建筑足迹挑战中使用的原始SpaceNet指标一样。 要进行匹配,地面实况足迹和提案足迹必须具有超过给定阈值的联合交叉(IOU)。 在先前的挑战中,使用的阈值为0.5,但是对于SpaceNet 7,由于解释较低分辨率图像的额外难度,我们将阈值放宽至0.25。 使用低于0.5的阈值可以使一个以上的提案足迹有资格与相同的地面事实足迹相匹配,反之亦然。 为了解决这种歧义,SCOT度量引入了一种在出现此类选择时选择匹配项的优化方法。 度量标准使用一组匹配项,该集合将不匹配的地面实际足迹和不匹配的建议足迹的数量最小化。 如果有多种方法可以实现该目标,那么将以IOU最高的一组比赛作为决胜局。 这是组合学中经过长期研究的问题(“不平衡线性分配问题”)的一个示例,SciPy和其他地方也可以使用算法解决方案。

For the original SpaceNet Metric, just one step remains after finding the matches. That’s calculating the F1 score, treating every match as a true positive:

对于原始的SpaceNet度量标准,找到匹配项后仅剩下一步。 那就是在计算F1分数,将每场比赛都视为真正的积极:

Image for post
F1 = F1 score, tp = true positives, fp = false positives, fn = false negatives
F1 = F1分数,tp =真阳性,fp =假阳性,fn =假阴性

The SCOT metric consists of two terms (tracking and change detection) that are both F1 scores that follow this same general procedure, but with one small (though significant) tweak in each case.

SCOT度量标准由两个术语(跟踪和更改检测)组成,这两个术语都是F1分数,遵循相同的一般过程,但在每种情况下都进行了一次小调整(尽管有效)。

1.2跟踪条款 (1.2 The Tracking Term)

To measure how well a proposed set of footprints tracks IDs from month to month, the tracking term calculation uses the same formula as above but applies a more stringent definition of what counts as a true positive.

为了衡量拟议的一组足迹每个月对ID的跟踪程度,跟踪项的计算使用与上述相同的公式,但是对要算为正值的情况采用了更为严格的定义。

A match between a ground truth footprint and a proposal footprint is considered a “mismatch” if the ground truth footprint was most recently matched with a different proposal footprint ID or if the proposal footprint was most recently matched with a different ground truth ID. (This is inspired by, but slightly different from, the MOTA definition of a mismatch.) When calculating the F1 score for tracking, only those matches that are not mismatches count as true positives. If a ground truth footprint is matched to a proposal footprint but it’s a mismatch, then the ground truth footprint is considered a false negative and the proposal footprint is considered a false positive, the same outcome as if they’d never been matched at all. The result is an F1 score that penalizes even correctly-located proposal footprints if their ID numbers are not consistent across time.

如果地面真理足迹最近与另一个提议足迹ID匹配,或者投标足迹最近与另一个地面真理ID匹配,则地面真理足迹和提案足迹之间的匹配被视为“不匹配”。 (这是由错配的MOTA定义启发而来的,但略有不同。)在计算F1分数以进行跟踪时, 只有那些没有错配的匹配才算为正值 。 如果地面实况足迹与提案足迹匹配,但是不匹配,则地面实况足迹被视为误报,而提案足迹被视为误报,其结果与根本没有匹配一样。 结果是F1分数,如果其ID号在时间上不一致,那么即使正确放置的投标足迹也将受到惩罚。

Image for post
Figure 1: Tracking. An example time series of four buildings over five months. Solid brown polygons are ground truth building footprints and outlines are proposal footprints. ID numbers for ground truth are centered over the buildings, and ID numbers for the proposals are to their upper right.
图1:跟踪。 四个建筑物在五个月内的时间序列示例。 棕色实心多边形是建筑物的地面实际足迹,轮廓线是建议足迹。 地面实况的ID号位于建筑物的中央,提案的ID号位于其右上方。

Figure 1 shows a made-up example of how this works. In this example, a small area with four buildings (ground truth IDs 1–4) is imaged each month for five months. The outlines show proposal footprints, labeled by their ID numbers. Buildings 2 and 4 are newly constructed during this time, and buildings 3 and 4 temporarily disappear in April, which can happen when an image is partially occluded by clouds. The model in this example does a good job of generating proposal footprints that are matched with ground truth building footprints, but it makes some mistakes with the proposals’ ID numbers, causing some of the matches to be mismatches. Notice that in April, proposal 12 shifts from being matched with building 2 to being matched with building 1. That leads to two mismatches in April: The match of building 1 and proposal 12 is a mismatch because proposal 12 had been most recently matched with a different building, while the match of building 2 and proposal 14 is a mismatch because building 2 had been most recently matched with a different proposal. On the other hand, the match of building 3 and proposal 11 in May is not a mismatch, because the last time this building and proposal were matched with anything — even though it was several months earlier — it was with each other.

图1显示了如何工作的虚构示例。 在此示例中,每个月对五个建筑物的小区域(地面真实ID 1-4)进行成像。 轮廓线显示了提案足迹,并以其ID号标记。 建筑物2和4在此期间是新建的,建筑物3和4在4月暂时消失,当图像部分被云遮挡时可能会发生。 此示例中的模型很好地生成了与基础事实建筑足迹相匹配的投标足迹,但是它在投标ID编号方面犯了一些错误,从而导致某些匹配项不匹配。 请注意,在4月,提案12从与2号楼的匹配变为与1号建筑物的匹配。这导致4月出现两次不匹配:1号楼与提案12的匹配不匹配,因为提案12最近与a匹配。 2号楼与提案14的匹配是不匹配的,因为2号楼最近与另一个提案匹配。 另一方面,5月的3号楼与提案11的匹配并不是错配,因为上一次此楼与提案匹配的任何内容(即使是几个月前)也相互匹配。

What all these details have in common is that proposals that stay on target with the same building are not mismatches, while ID mistakes count as mismatches during the specific months when those mistakes occur, lowering the value of the tracking term.

所有这些细节的共同点是,在同一建筑物上停留在目标上的投标不是错配,而ID错误则在发生这些错误的特定月份被视为错配,从而降低了跟踪条款的价值。

1.3变更检测项 (1.3 The Change Detection Term)

Like the tracking term, the change detection term is an F1 score that’s similar to the SpaceNet Metric with one small modification. This time, we don’t worry about mismatches. Instead, once the matching is complete, we simply ignore every building and proposal that isn’t making its chronological first appearance. That amounts to dropping any footprint (ground truth or proposal) with an ID number that has appeared in any previous month. Rather than measuring performance on all buildings, the result is an F1 score concerned only with new buildings (i.e., changes), requiring the model not only to find them but to identify them as such.

与跟踪项类似,更改检测项是F1分数,与SpaceNet Metric相似,但有一点点修改。 这次,我们不必担心不匹配。 相反,一旦匹配完成,我们将忽略所有未按时间顺序首次出现的建筑物和建议。 这等于丢弃任何具有上个月出现的ID号的足迹(真实情况或建议)。 F1得分不是针对所有建筑物的性能,而是仅与新建筑物(即变更)有关的F1分数,这要求模型不仅要找到它们,还要这样识别它们。

Image for post
Figure 2: Change detection. The same time series as Figure 1, showing new ground truth building footprints in brown, new proposal footprints in blue, and graying out all ground truth and proposal footprints with previously-seen IDs.
图2:变更检测。 与图1相同的时间序列以棕色显示了新的地面真相建筑足迹,以蓝色显示了新的建议足迹,并使用以前看到的ID将所有地面的真相和建议足迹灰显。

Figure 2 shows how this works for the same made-up example considered above. There are two appearances of new buildings: building 2 in February and building 4 in March. (Buildings already present in the first month of data don’t count.) Building 2’s February arrival is matched with proposal 12, which is also making its first appearance, so that counts as a true positive for the change detection term. Building 4’s arrival in March is matched with proposal 13, but that proposal ID already appeared in February so its subsequent appearance in March is ignored. Building 4’s arrival is therefore a false negative.

图2显示了如何使用上述相同的化妆示例。 新建筑物有两种外观:2月的2号楼和3月的4号楼。 (数据第一个月中已经存在的建筑物不计算在内。)建筑物2的2月到达与提案12相匹配,提案12也首次亮相,因此对于变更检测期限而言,它是真正的肯定。 4号楼3月到达与提案13相匹配,但该提案ID已在2月出现,因此其在3月的后续出现将被忽略。 因此,4号楼的到来是错误的否定。

To summarize the change detection term, only new footprints are used for the purpose of calculating the F1 score. One important property of this term is that a set of static proposals that do not vary from one month to another will receive a change detection score of 0, even for a data cube with very little new construction.

总结变化检测项,仅新足迹用于计算F1分数。 此术语的一个重要属性是,即使对于一个很少有新构造的数据立方体,一组从一个月到另一个月不变的静态提议也将获得零变化检测得分。

1.4放在一起 (1.4 Putting It All Together)

To combine the tracking term with the change detection term, a weighted harmonic mean of these two F1 scores is used:

为了将跟踪项与更改检测项结合使用,使用了这两个F1分数的加权谐波平均值:

Image for post

This makes it possible to tune the relative weight of tracking and change detection by setting a value for β. For SpaceNet 7 we use a value of β=2, which emphasizes the tracking term.

这使得可以通过设置β值来调整跟踪和变化检测的相对权重。 对于SpaceNet 7,我们使用β= 2的值,该值强调了跟踪项。

In SpaceNet 7 we have multiple locations (a.k.a. areas of interest), and so the aggregate reported score is a simple mean of each unique location’s SCOT score.

在SpaceNet 7中,我们有多个位置(也就是感兴趣的区域),因此,报告的总得分是每个唯一位置的SCOT得分的简单平均值。

2.结论 (2. Conclusions)

The uniqueness of the SpaceNet 7 dataset (namely: dynamic objects and geolocated addresses) demand careful consideration of an appropriate metric for the Multi-Temporal Urban Development Challenge. Existing metrics (e.g. MOTA) are not adequate for our proposed task of tracking uniquely identified building footprints through a dynamic time series. Accordingly, we created the SpaceNet Change and Object Tracking (SCOT) metric that combines an object tracking and change term into a singular score. One benefit of this metric is its continuity with past SpaceNet challenges: the SCOT tracking term reduces to the original SpaceNet Metric when evaluated at a single time step. Another benefit of the metric is explicitly quantifying both change detection, as well as object tracking performance. An upcoming post will provide further details on the implementation of SCOT, including the complete code, via the scoring of the SpaceNet 7 baseline algorithm.

SpaceNet 7数据集(即:动态对象和地理位置的地址)的唯一性要求仔细考虑针对多时相城市发展挑战的适当指标。 现有指标(例如,MOTA)不足以满足我们提出的通过动态时间序列跟踪唯一标识的建筑占地面积的建议任务。 因此,我们创建了SpaceNet更改和对象跟踪(SCOT)度量标准,该度量标准将对象跟踪和更改项组合为单个分数。 此度量标准的一个好处是它可以与过去的SpaceNet挑战保持连续性:在单个时间步中进行评估时,SCOT跟踪项可以简化为原始SpaceNet度量标准。 该指标的另一个好处是明确量化了变化检测以及对象跟踪性能。 即将发布的帖子将通过对SpaceNet 7基准算法的评分,提供有关SCOT实施的更多详细信息,包括完整的代码。

翻译自: https://medium.com/the-downlinq/the-spacenet-change-and-object-tracking-scot-metric-5339b0c904c

spacenet

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值