机器学习预测结果评估展示_评估通用社区测试计划的性能并预测结果

最新推荐文章于 2023-05-05 20:33:02 发布

weixin_26745985

最新推荐文章于 2023-05-05 20:33:02 发布

阅读量2.2k

点赞数

文章标签：机器学习 python 人工智能深度学习大数据

原文链接：https://medium.com/swlh/evaluating-the-performance-and-forecasting-outcomes-of-the-universal-community-testing-programme-4a831e3600ba

版权

机器学习预测结果评估展示

Dr Catrina Ko (Twitter: @dr_CatKo)Hong Kong Global Connect (Twitter: @HKGlobalConnect)

高慧琳博士(Twitter：@dr_CatKo)香港环球通(Twitter：@HKGlobalConnect)

6th September 2020

2020年9月6日

抽象 (Abstract)

Controversy over the efficacy and numerous other aspects of the mass screening programme for COVID-19 in Hong Kong has inspired interest into whether the scheme is succeeding in achieving its aim to identify silent carriers from the population accurately. This study retrospectively evaluated the test’s current performance by feeding data of the programme thus far into a confusion matrix and comparing them against the model outputs. Simple predictive analytics was also used to forecast the outputs that the testing procedure would continue to generate. Evaluation measures were then used to examine the models, which simulated how the test was expected to perform in real life, in both parts of the study. Results indicated that overwhelmingly large numbers of results would be generated for participants who are healthy, as opposed to those who are actual carriers, due to the low prevalence of the disease in Hong Kong. The study concluded that the low prevalence has knocked the class sizes of the confusion matrices out of balance and compromised the test’s ability to accurately classify and assign test outcomes to subjects. Given this verdict, it is unsure whether the testing programme would help or harm the COVID-19 situation in Hong Kong from a mathematical standpoint.

在香港，针对COVID-19的大规模筛查计划的功效和许多其他方面的争议引起了人们对该计划是否能够成功实现其目标的精确识别的兴趣。这项研究通过将到目前为止的程序数据输入混淆矩阵，并将其与模型输出进行比较，从而回顾性地评估了测试的当前性能。简单的预测分析还用于预测测试过程将继续生成的输出。然后，在研究的两个部分中，都使用评估措施来检查模型，该模型模拟了预期该测试在现实生活中将如何执行。结果表明，由于本病的患病率低，健康参与者的实际结果要比实际携带者大得多。研究得出结论，较低的患病率使混乱矩阵的班级规模失去了平衡，并损害了测试对受试者进行准确分类和分配测试结果的能力。根据这一判断，从数学的角度来看，不确定测试程序是否会帮助或损害香港的COVID-19情况。

介绍 (Introduction)

The Universal Community Testing Programme for COVID-19 is a free, voluntary mass screening scheme in Hong Kong. It is led by the HKSAR government in collaboration with the central government of the People’s Republic of China and with the supply of scientific expertise, technology and personnel from mainland laboratories and institutions, such as the BGI, which also manufactures the test kits used in the programme. The programme began on 1st September 2020 and has attracted controversy both before and after its commencement. Scientific and medical experts have questioned the programme’s efficacy and value in quelling the outbreak (University of Hong Kong, 2020), particularly when the scheme is not accompanied by a mandatory post-test quarantine for individuals awaiting their results (Zhou, Pang, Zaharia & Fernandez, 2020), and whether the base rate fallacy brought by the low prevalence entails that the scheme would instead endanger the population further (Hamlett, 2020). Members of the public concerned with privacy (Liu & Woodhouse, 2020) and the cost-effectiveness of the scheme have also expressed their doubts. The authorities’ stance remains that mass testing is necessary in order to ‘break the chain of transmission’ and will effectively identify the silent carriers within the community who are spreading the virus but are asymptomatic themselves, while encouraging the entire population to take part (Hong Kong leader chides critics of universal coronavirus test, 2020) (Kwan, 2020).

香港通用的COVID-19社区测试计划是一项免费的自愿性大规模筛查计划。它由香港特别行政区政府与中华人民共和国中央政府共同领导，并由内地实验室和机构(例如华大基因研究院)提供科学专门知识，技术和人员，该实验室还制造了用于程序。该计划于2020年9月1日开始，在实施之前和之后都引起了争议。科学和医学专家质疑该计划在遏制暴发中的功效和价值(香港大学，2020年)，特别是当该计划没有为等待结果的个人提供强制性的测试后检疫时(周，庞，Z哈拉和费尔南德斯(Fernandez，2020)，以及低患病率所带来的基准汇率谬误是否意味着该计划反而会进一步危及人口(Hamlett，2020)。关心隐私的公众(Liu＆Woodhouse，2020)也对该计划的成本效益表示了怀疑。当局的立场仍然是，必须进行大规模检测才能打破``传播链''，并有效识别社区内传播病毒但无症状的无声携带者，同时鼓励全体民众参与(HongKong长责备批评通用冠状病毒测试的批评家，2020年(Kwan，2020年)。

As of 20:00 HKT on 6th September 2020, approximately 1,132,000 people have signed up for the scheme; and about 675,000 samples have been PCR-tested for SARS-CoV-2 by the mainland experts (預約普及檢測人數增至逾113萬, 2020). The programme has identified 15 positive cases of COVID-19 so far (港增21確診7冇源頭外傭疑傳染同住九旬夫婦另添3死, 2020), 12 of which asymptomatic at the time of the test. At this midpoint of the duration of the programme, which is scheduled to finish on 11th September, it might be of value to statistically interrogate these data and evaluate its performance thus far, in light of the public debate. This study also aims to produce a mathematical prediction of its outcomes for the 1.1 million participants of the scheme using simple predictive analytics commonly adopted in medical settings, and observe whether these forecasts match with what the authorities envisioned the situation to be (林鄭月娥：4成個案源頭未明社區隱形患者傳播力高, 2020) and what they expected of the scheme.

截至2020年9月6日香港时间20:00，已有约1,132,000人报名参加该计划; 大陆专家已对约675,000个样本的SARS-CoV-2进行了PCR测试(预定普及检测人数增至至逾113万，2020年)。到目前为止，该计划已识别出15例COVID-19阳性病例(2020年，港增21确诊7头源头外佣疑似感染同住九旬夫妇另添3死)，其中12例在测试时无症状。在计划于9月11日结束的计划持续时间的中点，根据公众辩论，对这些数据进行统计调查并评估迄今为止的绩效可能是有价值的。这项研究的目的还在于，使用医疗环境中普遍采用的简单预测分析方法，为该计划的110万参与者提供数学结果的预测，并观察这些预测是否与当局所设想的情况相符(林郑月娥：4成个案源头未明社区隐形患者传播力高，2020年)以及他们对该计划的期望。

方法 (Method)

The objective of the study was to contextualise the data of the mass testing programme at a point where the numbers, especially the number of hits, are large enough for meaningful analyses and evaluation. These data were obtained from daily news reports, with the latest reported figures before midnight on the day of writing (6th September 2020) being used as inputs for this study. The numbers taken were: the total number of sign-ups for the scheme, a; the number of samples tested in the laboratories, b; the number of positive cases yielded (the number of hits) therein, c; the total number of confirmed cases (both on and off the scheme combined), d; the number of deaths, e; and the number of recoveries, f.

该研究的目的是将大规模测试程序的数据放在上下文中，此时数量(尤其是命中数)足够大，可以进行有意义的分析和评估。这些数据来自每日新闻报道，本文写作之日(2020年9月6日)午夜之前的最新报道数据用作该研究的输入。采取的数量是：该计划的注册总数， a ；在实验室中测试的样品数量， b ；其中产生的阳性病例数(命中数)， c ；的证实的病例(包括开启和关闭的方案相结合)，d的总数; 死亡人数e ；以及回收率f 。

The study was divided into two parts, with the first being a retrospective evaluation of the medical screening on the samples already tested, and the second being a forecast of test outcomes for all participants of the testing programme. Both parts involved the use of a confusion matrix as both an organiser of the numbers and a description of the proportion of predicted outcomes generated by the test based on the existing data.

该研究分为两个部分，第一部分是对已经测试过的样品进行医学筛查的回顾性评估，第二部分是对测试计划所有参与者的测试结果的预测。这两个部分都涉及使用混淆矩阵作为数字的组织者，并描述基于现有数据的测试所产生的预期结果的比例。

In the first part, the ‘test probability’, (c/b)*100%, was firstly calculated for reference and for comparison with the existing data. The prevalence of COVID-19 in Hong Kong, ((d-(e+f))/7 500 000)*100%, was also determined. The prevalence was then used to compute the numbers of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN), as well as the positive predictive value (PPV) and the false omission rate (FOR), in a confusion matrix (Fig. 1). The Matthews correlation coefficient (MCC) was calculated in the end using the results from the confusion matrix as an evaluation measure of the quality of the classifier prediction which is the medical test itself in the context of this study. A few performative points would then be qualitatively evaluated from these metrics for the samples that had already been analysed in the screening programme.

在第一部分中，首先计算出“测试概率”( c / b )* 100％，以作为参考并与现有数据进行比较。还确定了香港COVID-19的患病率(( d- ( e + f ))/ 750万)* 100％。然后使用患病率计算出真阳性(TP)，假阳性(FP)，假阴性(FN)和真阴性(TN)的数量，以及阳性预测值(PPV)和假遗漏率(FOR)，位于混淆矩阵中(图1)。最后，使用混淆矩阵的结果作为分类器预测质量的评估指标来计算马修斯相关系数(MCC)，分类器预测的质量是本研究的背景。然后将从这些指标中定性评估一些性能点，以用于筛选程序中已分析的样品。

In the second part, the prevalence was fed into this predictive part of the study that models the outcomes of the screening for a. A confusion matrix (Fig. 2) was also drawn for computing the same set of test outcomes and metrics as from the first part. The F1 score was calculated as an additional measure of the test’s performance as the test here would be applied to subjects whose samples are yet to be analysed. In the part of the study, the number of ‘silent carriers’ in the population, (TP+FN)*7 500 000/a, that the test should provide based on the outcomes of this model was also calculated. The figure would then be compared with the authorities’ initial estimates of the number of silent carriers which presumably worried and urged them into pushing this scheme out quickly without public consultation. A qualitative evaluation on the performance of the test inferred from the calculated metrics would also be discussed within the context of the current COVID-19 situation in Hong Kong.

在第二部分中，患病率为送入这个研究的预测一部分筛选的模型的结果。还绘制了一个混淆矩阵(图2)，用于计算与第一部分相同的一组测试结果和度量。 F1分数是对测试性能的一种额外衡量，因为此处的测试将应用于尚未分析其样本的受试者。在研究的一部分中，还计算了测试应基于此模型的结果提供的人口中“无声携带者”的数量( TP + FN )* 7 500 000 / a 。然后，将该数字与当局对无声运输者数量的初步估计进行比较，后者可能会担心并敦促他们在没有征询公众意见的情况下Swift推出该计划。从计算得出的指标得出的对测试性能的定性评估也将在香港当前COVID-19的情况下进行讨论。

分析 (Analysis)

There might (and would) be better statistical methods to scrutinise the performance and efficacy of this mass screening programme and the kit it uses to test for SARS-CoV-2, and perhaps even over time given the constantly evolving data. However, this study was limited by the availability of finer information on the specification of the kit and any of its rivals for comparison. The MCC and the F1 score were very crude performance measures, but were what could be conveniently used with the information at hand. The confusion matrix is also the simplest and most common technique standardly used to at least preliminarily evaluate a medical screening test, requiring easily obtained and derived information such as the number of samples being tested, the prevalence of the disease, and the sensitivity and specificity of the kit used for the test. Thus, although this study was not strictly a statistical interrogation, it did use statistical techniques and predictive analytics to achieve its objectives and lay out the mathematical anatomy of the data in an organised way.

有可能(并且会)有更好的统计方法来检查这种大规模筛查计划及其用于测试SARS-CoV-2的试剂盒的性能和功效，甚至可能随着时间的推移不断发展。但是，这项研究受到有关试剂盒规格及其任何竞争对手进行比较的详细信息的可用性的限制。 MCC和F1分数是非常粗略的性能指标，但是可以方便地将其与现有信息结合使用。混淆矩阵也是至少用于初步评估医学筛查测试的最简单，最常用的技术，它要求容易获得和获得的信息，例如被测样品的数量，疾病的患病率以及感染的敏感性和特异性。用于测试的套件。因此，尽管这项研究并非严格意义上的统计调查，但它确实使用统计技术和预测分析来实现其目标，并以有组织的方式对数据进行了数学剖析。

结果 (Results)

As of 20:00 HKT on 6th September 2020, approximately 675,000 samples, b, had been PCR-tested for SARS-CoV-2, out of which 15, c, were confirmed positive. Using these numbers, the ‘test probability’ to be used for comparison and evaluating the test’s performance to date was calculated as:

截至2020年9月6日香港时间20:00，已对SARS-CoV-2的大约675,000个样本b进行了PCR测试，其中15个c确认为阳性。使用这些数字，将用于比较和评估迄今为止的测试性能的“测试概率”计算为：

Test probability = (c/b)*100% = (15/675 000)*100% ≈ 0.0022% (Eq. 1)

测试概率=( c / b) * 100％=(15/675 000)* 100％≈0.0022％ (式1)

where c is the number of hits and b is the number of samples that had undergone PCR analysis.

其中，c是命中数， b是经过PCR分析的样品数。

The prevalence of COVID-19 in Hong Kong was also calculated to be fed into the models (Fig. 1 and 2):

还计算出了香港COVID-19的患病率(图1和2)：

Prevalence = (No. of active cases/Total population)*100% = ((d-(e+f))/7 500 000)*100% = (271/7 500 000)*100% ≈ 0.0036% (Eq. 2)

患病率=(活跃病例数/总人口)* 100％=(( d- ( e + f ))/ 750万)* 100％=(271/7 500 000)* 100％≈0.0036％ (Eq 2)

where d is the total number of confirmed cases, e is the number of deaths, and f is the number of recoveries, all as of the end of 6th September 2020.

截至2020年9月6日，其中d是确诊病例的总数， e是死亡人数， f是可追回的人数。

The test outcomes, i.e. the numbers of TP, FP, FN, and TN, were then computed using Eq. (2), the number of samples analysed (b), and the specification of the kit as given by the BGI, 99% for both the sensitivity and the specificity, in the confusion matrix (Fig. 1). The PPV and the FOR were then derived from the test outcomes. The total number of actual positives, TP+FN, was 24.30. Figure 1 summarises the results for this part of the study:

然后使用等式计算测试结果，即TP，FP，FN和TN的数量。 (2)，分析的样品数量( b )和BGI给出的试剂盒规格，在混淆矩阵中灵敏度和特异性均为99％(图1)。然后从测试结果中得出PPV和FOR。实际阳性总数TP + FN为24.30。图1总结了这部分研究的结果：

Figure 1: Confusion matrix for evaluating test performance on the samples that had already undergone the PCR test

图1：用于评估已经进行PCR测试的样品的测试性能的混淆矩阵

The quality of the test’s classification was given by the MCC, calculated using the formula below (Matthews, 1975) and the outcomes from Fig. 1:

测试分类的质量由MCC给出，使用下面的公式(Matthews，1975年)和图1得出的结果计算得出：

MCC = (TP*TN)-(FP*FN)/√(TP+FP)(TP+FN)(TN+FP)(TN+FN) ≈ 0.059(Eq. 3)

MCC =(TP * TN)-(FP * FN)/√(TP + FP)(TP + FN)(TN + FP)(TN + FN)≈0.059 (等式3)

Taking the result from Eq. (1), the total number of sign-ups for the mass screening programme as of 20:00 HKT on 6th September 2020, a, which is 1,132,000, and the specification of the kit as provided by the BGI, the predicted test outcomes for a were computed in the confusion matrix (Fig. 2), and from which the PPV and the FOR were derived. The sum of TP and FN, and thus the total predicted number of ‘silent carriers’ that the testing procedure would discover by the time all a had been tested for COVID-19, was 40.75. These are summarised in Fig. 2:

从等式中得出结果。 (1)，截至2020年9月6日香港时间20:00的大规模筛查计划的注册总人数a为1,132,000，以及BGI提供的试剂盒规格，预测的检测结果在混淆矩阵中计算出α (图2)，并从中推导出PPV和FOR。 TP和FN的总和为40.75，因此测试程序在测试完所有的 COVID-19时将发现的“静默载体”总数预计为40.75。这些总结在图2中：

Figure 2: Confusion matrix for predicting the outcomes for all participants of the mass testing programme

图2：用于预测大规模测试计划所有参与者结果的混淆矩阵

The F1 score for the model in this part of the study was calculated to be:

在研究的这一部分中，该模型的F1得分计算为：

F1 score = 2((Precision*Sensitivity)/(Precision+Sensitivity)) = 2((0.0036*0.99)/(0.0036+0.99)) ≈ 0.0072(Eq. 4)

F1分数= 2((精度*灵敏度)/(精度+灵敏度))= 2((0.0036 * 0.99)/(0.0036 + 0.99))≈0.0072 (式4)

Taking the outcomes from Fig. 2, the MCC for this part of the study was:

根据图2的结果，本部分研究的MCC为：

MCC = (TP*TN)-(FP*FN)/√(TP+FP)(TP+FN)(TN+FP)(TN+FN) ≈ 0.059(Eq. 5)

MCC =(TP * TN)-(FP * FN)/√(TP + FP)(TP + FN)(TN + FP)(TN + FN)≈0.059 (式5)

Given the outcomes generated by the confusion matrix (Fig.2), this model predicted the number of ‘silent carriers’ in the population, i.e. the model-predicted prevalence of COVID-19, to be:

给定混淆矩阵产生的结果(图2)，该模型预测了人口中“沉默携带者”的数量，即模型预测的COVID-19患病率是：

Predicted number of silent carriers in the population= (TP+FN)*7 500 000/1 132 000 ≈ 269.99(Eq. 6)

人口中预测的无声载波数量=(TP + FN)* 750万/ 1 132 000≈269.99 (式6)

讨论区 (Discussion)

The purpose of this study was to evaluate the performance of the Universal Community Testing Programme for COVID-19 on the data already out and available of the scheme. Predictions were also made for the test subjects who are yet to have their samples analysed or receive an outcome. These were achieved by simple mathematical computations. The models, driven purely by algorithmic rules and pre-existing data, thus produced highly objective numerical results and evaluation measures without moderation by any inbuilt qualitative terms or ecological (real-world) variables.

这项研究的目的是根据该计划已经存在和可用的数据评估针对COVID-19的通用社区测试计划的性能。还对尚未进行样品分析或得出结果的测试对象进行了预测。这些是通过简单的数学计算实现的。这些模型完全由算法规则和现有数据驱动，因此产生了高度客观的数值结果和评估指标，而无需受到任何内置的定性术语或生态(现实世界)变量的影响。

The first part of the study examined the data of the cases that had already been PCR-tested by the end of the 6th day of the programme in a mathematical setting by placing them into a confusion matrix (Fig. 1). The number of actual positives, TP+FN, did not match with what has been reported as the number of positive cases yielded from the programme, which fell short by 9 as of the end of 6th September 2020. A probable explanation for this could be that the sensitivity of the test kit might not be as high as 99% in reality, or that external factors during the process of sample collection or laboratory analysis had affected the results. The number of FP did not materialise, presumably as the authorities had done further tests on the positive outcomes from the first laboratory run to confirm positivity before publishing the results (普及檢測驗出六宗確診個案, 2020). This study presumes that this practice will continue throughout the duration of the programme.

研究的第一部分通过将程序放入混淆矩阵(图1)，在数学设置的第六天结束时检查了已经通过PCR测试的病例数据。 TP + FN的实际阳性数与该计划产生的阳性病例数不符，截至2020年9月6日，该数字不足9个。对此的可能解释是现实中测试试剂盒的灵敏度可能不会高达99％，或者样品采集或实验室分析过程中的外部因素影响了结果。 FP的数量没有实现，大概是因为当局在发布结果之前对首次实验室运行的阳性结果进行了进一步测试，以确认阳性(扩散检测验出六宗确诊个案，2020)。这项研究假设这种做法将在整个计划期间持续进行。

The second part of the study was an attempt to forecast the test outcomes for all participants (i.e. including those whose samples are yet to be analysed), also using a confusion matrix (Fig. 2). It is worth noting that the evaluation measures, e.g. the MCC, the PPV, and the FOR, were very similar between the two sets of results. This is due to the fact that the same prevalence (Eq. (2)) was used in both computations but that only the number of samples had varied. The model predicted 40.75 (the sum of TP and FN from Fig. 2) actual positives, asymptomatic at the time of sample collection, to be identified when all of these subjects (a, as of the end of 6th September 2020) have had their samples analysed. This forecast can be evaluated when the corresponding real-life data become available in the coming days.

该研究的第二部分是尝试使用混淆矩阵来预测所有参与者(即包括尚未分析其样本的参与者)的测试结果(图2)。值得注意的是，两组结果之间的评估指标(例如MCC，PPV和FOR)非常相似。这是由于在两个计算中使用了相同的患病率(等式(2))，但是只有样本数量发生了变化。该模型预测，当所有这些受试者( a ，截至2020年9月6日结束)都具有自己的特征时，将识别出40.75(图2中TP和FN的总和)实际阳性，在样品采集时无症状。样品分析。当未来几天可获得相应的真实数据时，可以评估此预测。

The ability of the testing procedure to identify true conditions was measured as the MCC (Eq. (3) and (5)), which is essentially a correlation coefficient measuring the strength and direction of the association between actual conditions (TP+FN and FP+TN) and the predicted outcomes (TP+FP and TN+FN), for both parts of the study. The MCC for the models were both close to zero, suggesting that the test assigned (Fig. 1) and would assign (Fig. 2) test results to subjects at near random. This, however, must be interpreted with caution as, although the MCC accommodates well scenarios in which the class sizes of the confusion matrix are very unbalanced (Boughorbel, Jarray & El-Anbari, 2017), there are some where one class is too small (such as the near-zero FN from both Figures) and short-circuits the measurement. The struggle to measure the performance of the models meaningfully using the MCC after all is related to the fact that the total numbers of actual positives (TP and FN) were disproportionately small compared to the overwhelmingly large total of actual negatives (TN and FP). This was not unexpected from the low prevalence (Eq. (2)).

测试程序确定真实条件的能力以MCC(等式(3)和(5))进行衡量，它实质上是一个相关系数，用于测量实际条件( TP + FN和FP)之间关联的强度和方向研究的两个部分均采用+ TN )和预测结果( TP + FP和TN + FN )。模型的MCC都接近于零，表明测试分配给了受试者(图1)并且将测试结果分配给了受试者(图2)几乎是随机的。然而，这必须谨慎理解，因为尽管MCC可以很好地解决混淆矩阵的类大小非常不平衡的情况(Boughorbel，Jarray和El-Anbari，2017年)，但在某些情况下，一个类太小了(例如两个图中的FN都接近零)，并使测量短路。毕竟，使用MCC有意义地衡量模型性能的努力与以下事实有关，即与实际负极(TN和FP)的绝大多数相比，实际正极(TP和FN)的总数不成比例地少。从低患病率来看，这并不意外(等式(2))。

The low prevalence (Eq. (2)) had given rise to the phenomenon that the models seemed to perform better at classifying negative test outcomes than they would positive. The probability of infection is simply very low, and thus a vast majority of cases would be actual negatives; there is simply an overwhelmingly high chance that the test would identify true negatives (TN) successfully. Its contrast with the very small number of false negatives (FN) was reflected in the negligibly low FOR (Fig. 1 and 2), which is preferable. However, a contrast of this magnitude also existed between the number of actual positives and that of actual negatives overall, once again due to the low prevalence (Eq. (2)), and affected the positive predictive power of the test in the other direction.

较低的患病率(等式(2))引起了这样的现象，即模型在对阴性测试结果进行分类时似乎比对阳性结果更好。感染的可能性非常低，因此，绝大多数病例是实际阴性。测试成功识别真阴性(TN)的可能性非常高。其与极少的假阴性(FN)的对比反映在可忽略的低FOR中(图1和2)，这是可取的。然而，再次由于低流行率(等式(2))，实际阳性的数量和总体阴性的数量之间也存在这种幅度的对比，并且从另一个方向影响了测试的阳性预测能力。

The low prevalence (Eq. (2)) impacted the PPV (Fig. 1 and 2) negatively by influencing the expected proportion of false positives (FP) out of all positive test outcomes. The PPV is very low for both models (Fig. 1 and 2) from both parts of the study, indicating that many of the positive outcomes from this testing procedure would be false positives (FP), despite the supposedly high specificity (99%) of the kit. If the number of false results is presented as a rate or a percentage, it represents the likelihood of a group of any size being given false results in this test procedure if the rate stays relatively constant through to the end of the programme. The numbers of FN and FP are thus performance measures by nature, showing that the test’s performance is intrinsically limited by the low prevalence (Eq. (2)).

较低的患病率(等式(2))通过影响所有阳性测试结果中假阳性(FP)的预期比例，对PPV(图1和2)产生了负面影响。研究的两个部分的两个模型的PPV都非常低(图1和2)，这表明尽管有很高的特异性(99％)，但该测试程序的许多阳性结果还是假阳性(FP)。该套件。如果错误结果的数量以比率或百分比表示，则表示在整个测试过程中，如果比率保持相对恒定，则在此测试过程中可能会给任何规模的组提供错误结果的可能性。因此，FN和FP的数量本质上是性能指标，表明该测试的性能本质上受到低患病率的限制(等式(2))。

The PPV is of significance because it can be used to describe the performance of this screening test. The F1 score (Eq. (4)) measures whether a test tolerates more false positives (dictated by precision, i.e. the PPV) or false negatives (dictated by recall, i.e. the sensitivity). The F1 was, in the case of this study and contrary to some suggestions (e.g. Chicco and Jurman, 2020), a more meaningful and informative measure of the test’s performance than the MCC. The low prevalence (Eq. (2)) resulted in a PPV that is close to zero. The F1 score (Eq. (4)) for the predictive model (Fig.2) of the study was very low, meaning that the balance had pivoted towards a toleration of false positives; this in turn characterised this medical test. The implication of this observation is that a significant amount of extra effort and resources would need to be spent on testing preliminary positive results to reassure the public that their positive results are true, and to tackle the problems foreshadowed by the base rate fallacy (Hamlett, 2020).

PPV具有重要意义，因为它可用于描述此筛选测试的性能。 F1分数(等式(4))衡量的是测试是否容忍更多的假阳性(由精度，即PPV决定)或假阴性(由召回率，即敏感性决定)。在这项研究中，F1与某些建议(例如Chicco和Jurman，2020年)背道而驰，比起MCC，它是对测试性能的更有意义和更有意义的衡量。低患病率(等式(2))导致PPV接近于零。该研究的预测模型(图2)的F1得分(等式(4))非常低，这意味着平衡已经转向了对假阳性的容忍度。这反过来又表征了这项医学测试。该观察结果的含义是，需要花费大量的额外精力和资源来测试初步的阳性结果，以使公众确信其阳性结果是正确的，并解决了基准利率谬误所预示的问题(Hamlett， 2020)。

The aim of the mass screening programme was, according to the authorities, to discover the ‘silent carriers’ within the community accurately and sever transmission chains. The authorities proposed that there were 1,500 silent carriers of COVID-19 in the community (林鄭月娥：4成個案源頭未明社區隱形患者傳播力高, 2020). Based on the predictive outcome from the second model (Fig. 2) of this study, as of the end of 6th September, the testing scheme would identify up to 269.99 silent cases (Eq. (6)) out of a population of 7.5 million, if a true whole-of-population screening was achieved. This is an extrapolation from the predicted results of the test for the 1.132 million who signed up for the scheme. The models and their predictions would have been refined if they took into consideration the demographics of the participants of the programme versus those who chose not to participate, and if the causes of the mismatch between the expected yield of clinically confirmed positive cases returned by the retrospective model (fig. 1) and the actual yield were known. As for now, 269.99 may be taken as a mathematically generated reference for the number of silent carriers among us at the moment.

当局表示，大规模筛查计划的目的是在社区内准确发现“沉默的携带者”并切断传播链。当局提议在社区中有1500名COVID-19的沉默携带者(林郑月娥：4成个案源头未明社区隐形患者传播力高，2020)。根据本研究第二个模型的预测结果(图2)，截至9月6日，测试方案将在750万人口中识别出多达269.99个无声案例(等式(6))。 (如果实现了真正的总体筛选)。这是对报名参加该计划的1133.2万测试的预测结果的推断。如果考虑到计划参与者的人口统计学特征与选择不参加的参与者的统计学特征，以及回顾性研究返回的临床确诊阳性病例的预期收率不匹配的原因，则模型和预测将得到完善。模型(图1)和实际产量是已知的。就目前而言，可以将269.99作为目前我们中无声载波数量的数学生成参考。

结论 (Conclusion)

This study has evaluated the performance of the Universal Community Testing Programme for COVID-19 in Hong Kong using simple predictive analytics, and has established that the low prevalence of the disease is the main challenge to the efficacy of the test from a mathematical perspective.

这项研究使用简单的预测分析方法评估了香港通用社区测试计划(COVID-19)的性能，并从数学角度确定了该疾病的低患病率是测试功效的主要挑战。

It is not yet known whether the mass screening programme would help or harm Hong Kong’s COVID-19 situation at the point where the city is already making its exit out of its third wave of outbreak and with an Rt of <0.5 signifying that the outbreak has already come under control (普及社區檢測計劃展開許樹昌梁卓偉接受檢測, 2020). The directional effect of the programme is difficult to predict as it is unknown whether the number of infections would rise from close contacts between medics and the participants of the scheme and if they would carry pathogens out into the community. It is also unknown what impact it would have if carriers awaiting their results and medical personnel involved in the operation of the scheme still roamed freely within the community.

目前尚不知道大规模筛查计划是否会帮助或损害香港的COVID-19局势，因为该城市已经退出第三次爆发，Rt <0.5表示爆发已经已经受到控制(普及社区检测计划展开许树昌梁卓伟接受检测，2020年)。该计划的方向性效果很难预测，因为尚不知道医护人员与该计划参与者之间的密切接触是否会增加感染数量，以及是否会将病原体带入社区。同样未知的是，如果等待结果的承运人以及参与该计划实施的医务人员仍在社区内自由漫游，将会产生什么影响。

The programme targets asymptomatic and assumed healthy individuals while symptomatic patients are urged to seek medical help immediately. This means that the incidence rate for COVID-19 may or may not be monitored or calculated separately from the rolling outputs of the community test programme. It is not yet clear how the system categorises an individual who was healthy or asymptomatic but became infected or symptomatic between the sample collection and result notification. Such confusion of data could convolute good-natured mathematical work seeking to understand and contextualise the COVID-19 situation in Hong Kong.

该计划针对无症状和假定健康的人，同时敦促有症状的患者立即寻求医疗帮助。这意味着COVID-19的发生率可能会或可能不会与社区测试计划的滚动输出分开监控或计算。尚不清楚系统如何在样本收集和结果通知之间对健康或无症状但被感染或有症状的个体进行分类。如此混乱的数据可能会使精明的数学工作变得复杂，这些工作试图了解香港的COVID-19情况并与之相关。

Puzzling patterns within the data pending explanation have already been picked up by this study. When comparing the ‘test probability’ (Eq. (1)) with the population prevalence (Eq. (2)), the calculated ‘test probability’, which represents the current number of hits yielded by the BGI’s test after further retests by the Department of Health, turned out to be a lower value than the already very low population prevalence. Further research is required to statistically compare, and determine the significance of, the difference between the model outputs when the ‘test probability’ is used as an input and when the prevalence is used. Such studies would be a real test on the performance expected of the screening programme against reality, and provide extra information to the public for their judgement on whether large-scale programmes such as this as part of our long battle with COVID-19 are always worth their while.

这项研究已经掌握了待解释数据中的令人困惑的模式。当将“测试概率”(等式(1))与总体患病率(等式(2))进行比较时，计算出的“测试概率”代表了BGI测试通过当前的进一步测试后产生的当前命中数。事实证明，卫生部的价值低于已经非常低的人口患病率。当使用“测试概率”作为输入和使用普遍性时，需要进一步研究以统计比较模型输出之间的差异并确定其显着性。这些研究将对筛查程序相对于现实的预期性能进行真实测试，并向公众提供更多信息，供他们判断像这样的大规模程序是否值得我们与COVID-19长期战斗的一部分他们的时间。

A downloadable PDF of this report can be accessed at: https://drive.google.com/file/d/1g0kcZL5lnzehH4YUWPf76K-FlGtgqw7h/view

可通过以下 网址 访问此报告的可下载PDF： https ： //drive.google.com/file/d/1g0kcZL5lnzehH4YUWPf76K-FlGtgqw7h/view

参考书目 (Bibliography)

Apple Daily 蘋果日報. 2020. 【疫情焦點】港增21確診7冇源頭外傭疑傳染同住九旬夫婦另添3死(附個案搜尋器). [online] Available at: <https://hk.appledaily.com/local/20200906/CBVNTVMFMRFRFILSPPQFNBDXAQ/> [Accessed 6 September 2020].

Apple Daily苹果日报。 2020年。 【疫情焦点】港增21确诊7冇源头外佣疑病传染同住九旬夫妇另添3死(附个案搜寻器) 。 [在线]可用：<https://hk.appledaily.com/local/20200906/CBVNTVMFMRFRFRFILSPPQFNBDXAQ/> [2020年9月6日访问]。

Boughorbel, S., Jarray, F. and El-Anbari, M., 2017. Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLOS ONE, 12(6), p.e0177678.

Boughorbel，S.，Jarray，F。和El-Anbari，M.，2017年。使用Matthews相关系数度量标准对不平衡数据进行最佳分类。 PLOS ONE ，12(6)，第e0177678页。

Chicco, D. and Jurman, G., 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1).

Chicco，D.和Jurman，G.，2020年。马修斯相关系数(MCC)优于F1分数的优势和二元分类评估的准确性。 BMC基因组学 ，21(1)。

Hamlett, T., 2020. To Test Or Not To Test? That Is Not A Political Question | Hong Kong Free Press HKFP. [online] Hong Kong Free Press HKFP. Available at: <https://hongkongfp.com/2020/08/29/to-test-or-not-to-test-that-is-not-a-political-question/> [Accessed 6 September 2020].

哈姆利特，T.，2020年。 要测试还是不测试？ 那不是政治问题 香港自由出版社 。 [在线]香港自由出版社HKFP。可在以下网址获得：<https://hongkongfp.com/2020/08/29/to-test-or-not-to-test-that-is-not-a-political-question/> [2020年9月6日访问]。

Hong Kong’s Information Services Department. 2020. 普及檢測驗出六宗確診個案. [online] Available at: <https://www.news.gov.hk/chi/2020/09/20200903/20200903_172026_512.html> [Accessed 6 September 2020].

香港新闻处。 2020。广泛检测验出六宗确诊个案 。 [在线]可用：<https://www.news.gov.hk/chi/2020/09/20200903/20200903_172026_512.html> [2020年9月6日访问]。

Hong Kong’s Information Services Department. 2020. 預約普及檢測人數增至逾113萬. [online] Available at: <https://www.news.gov.hk/chi/2020/09/20200906/20200906_213141_470.html#:~:text=%E6%99%AE%E5%8F%8A%E7%A4%BE%E5%8D%80%E6%AA%A2%E6%B8%AC%E8%A8%88%E5%8A%83%E9%80%B2%E8%A1%8C,2019%E5%86%A0%E7%8B%80%E7%97%85%E6%AF%92%E6%A0%B8%E9%85%B8%E6%AA%A2%E6%B8%AC%E3%80%82> [Accessed 6 September 2020].

香港新闻处。 2020年。 预定普及检测人数增至逾113万 。 [在线]可用于：<https://www.news.gov.hk/chi/2020/09/20200906/20200906_213141_470.html#:~:text=%E6%99%AE%E5%8F%8A%E7 ％A4％BE％E5％8D％80％E6％AA％A2％E6％B8％AC％E8％A8％88％E5％8A％83％E9％80％B2％E8％A1％8C，2019％E5 ％86％A0％E7％8B％80％E7％97％85％E6％AF％92％E6％A0％B8％E9％85％B8％E6％AA％A2％E6％B8％AC％E3％80 ％82> [2020年9月6日访问]。

Kwan, R., 2020. Covid-19: Hong Kong Cases Dip To Single Digits For First Time In 7 Weeks, As Mass Testing Registration Set To Begin | Hong Kong Free Press HKFP. [online] Hong Kong Free Press HKFP. Available at: <https://hongkongfp.com/2020/08/24/covid-19-hong-kong-cases-dip-to-single-digits-for-first-time-in-7-weeks-as-mass-testing-registration-set-to-begin/> [Accessed 6 September 2020].

Kwan，R.，2020年。Covid-19：随着大规模测试注册的开始，香港案件在7周内首次下降到个位数| 香港自由出版社 。 [在线]香港自由出版社HKFP。可在以下网址获取：<https://hongkongfp.com/2020/08/24/covid-19-hong-kong-cases-dip-to-single-digits-for-first-in-7-weeks-as-质量测试注册开始设置[2020年9月6日访问]。

Kyodo News+. 2020. Hong Kong Leader Chides Critics Of Universal Coronavirus Test. [online] Available at: <https://english.kyodonews.net/news/2020/08/0b0e7d7bc899-hong-kong-leader-chides-critics-of-universal-coronavirus-test.html> [Accessed 6 September 2020].

共同社新闻+。 2020 年。香港领导人谴责通用冠状病毒检测的批评 。 [在线]可在以下网址访问：<https://english.kyodonews.net/news/2020/08/0b0e7d7bc899-hong-kong-leader-chides-critics-of-universal-coronavirus-test.html> [2020年9月6日访问]。

Liu, N. and Woodhouse, A., 2020. Hong Kong Covid-19 Mass Testing Sows Distrust Among Activists. [online] Financial Times. Available at: <https://www.ft.com/content/d9c6219c-4022-4f75-bc0c-73153ba6f4b5> [Accessed 6 September 2020].

Liu，N.和Woodhouse，A.，2020 年。《香港Covid-19大规模测试对活动家之间的不信任》 。 [在线]金融时报。可在以下网址获得：<https://www.ft.com/content/d9c6219c-4022-4f75-bc0c-73153ba6f4b5> [2020年9月6日访问]。

Matthews, B., 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) — Protein Structure, 405(2), pp.442–451.

Matthews，B.，1975。T4噬菌体溶菌酶的预测和观察到的二级结构的比较。 生物化学与生物物理学报(BBA)—蛋白质结构 ，405(2)，第442–451页。

News.rthk.hk. 2020. 林鄭月娥：4成個案源頭未明社區隱形患者傳播力高 — RTHK. [online] Available at: <https://news.rthk.hk/rthk/ch/component/k2/1542386-20200807.htm> [Accessed 6 September 2020].

News.rthk.hk。 2020年。 林郑月娥：4成个案源头未明社区隐形患者传播力高—香港电台 。 [在线]可用：<https://news.rthk.hk/rthk/ch/component/k2/1542386-20200807.htm> [2020年9月6日访问]。

Now 新聞. 2020. 普及社區檢測計劃展開許樹昌梁卓偉接受檢測. [online] Available at: <https://news.now.com/home/local/player?newsId=403872> [Accessed 6 September 2020].

现在新闻。 2020。 普及社区检测计划展开许树昌梁卓伟接受检测 。 [在线]可用：<https://news.now.com/home/local/player?newsId=403872> [2020年9月6日访问]。

University of Hong Kong, 2020. Ho Pak-Leung: Universal Tests Are Like Wasting Bullets. [online] Available at: <https://fightcovid19.hku.hk/ho-pak-leung-universal-tests-are-like-wasting-bullets/> [Accessed 6 September 2020].

香港大学，2020年。 何伯良：普遍测试就像浪费子弹 。 [在线]可用：<https://fightcovid19.hku.hk/ho-pak-leung-universal-tests-are-like-wasting-bullets/> [2020年9月6日访问]。

Zhou, J., Pang, J., Zaharia, M. and Fernandez, C., 2020. Hong Kong Health Workers, Activists Urge Boycott Of Mass Testing. [online] U.S. Available at: <https://www.reuters.com/article/us-health-coronavirus-joshua-wong/hong-kong-health-workers-activists-urge-boycott-of-mass-testing-idUSKBN25Q0E0> [Accessed 6 September 2020].

Zhou，J.，Pang，J.，Zaharia，M。和Fernandez，C.，2020 年。香港卫生工作者，活动家敦促抵制大规模检测 。 [在线]美国可用网址：<https://www.reuters.com/article/us-health-coronavirus-joshua-wong/hong-kong-health-workers-activists-urge-boycott-of-mass-testing- idUSKBN25Q0E0> [2020年9月6日访问]。

翻译自: https://medium.com/swlh/evaluating-the-performance-and-forecasting-outcomes-of-the-universal-community-testing-programme-4a831e3600ba

机器学习预测结果评估展示

weixin_26745985

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习预测结果评估展示_评估通用社区测试计划的性能并预测结果

机器学习预测结果评估展示Dr Catrina Ko (Twitter: @dr_CatKo)Hong Kong Global Connect (Twitter: @HKGlobalConnect) 高慧琳博士(Twitter：@dr_CatKo)香港环球通(Twitter：@HKGlobalConnect) 6th September 2020 2020年9月6日抽象 (Abstract)...
复制链接

扫一扫