南澳大学INFS 2042 Data Structures Advanced Assignment 2 – Contact Tracing

INFS 2042 Data Structures Advanced Assignment 2 – Contact Tracing

INFS 2042 Data Structures Advanced Assignment 2 – Contact Tracing

wechat:help-assignment

1. Introduction

To track and reduce the spread of a disease during an epidemic or pandemic situation it is critical that authorities and health experts can trace who has been in contact with whom, when the contact occurred and where. This is known as contact tracing. Efficiently searching potentially millions of people and where they have been will require an efficient way to store and navigate through the data.
In this assignment, you are tasked with building a basic contact tracing system. You must use your knowledge of data structures and search algorithms to efficiently store and process large quantities of contact tracing data. You are not restricted to the data structures and algorithms explored in this course. You may also make use of structures and algorithms from the Data Structures Essentials course.

1.导言
为了跟踪和减少疫情期间的疾病传播,当局和卫生专家必须追踪哪些人与谁有过接触,接触发生的时间和地点。这被称为接触追踪。要高效地搜索可能有数百万人以及他们去了哪里,就需要有一种高效的方式来存储和浏览数据。
在本作业中,你的任务是建立一个基本的联系人追踪系统,你必须使用你对数据结构和搜索算法的知识来有效地存储和处理大量的联系人追查数据,你并不局限于本课程中所探讨的数据结构和算法,你也可以使用数据结构基础课程中的结构和算法。

  1. Requirements
    Your client has provided you with a strict set of system requirements that the program must meet. How you achieve those requirements and which algorithms or data structures you use are up to you. You must implement the program in Java using OpenJDK 11 or newer. You should also aim to make the program as efficient as possible. For example, exhaustively searching lists in nested loops would not be the most efficient implementation in many cases.
    Generally, it is easier to design with optimisation in mind. When using the following data structures: Binary Search Tree, Self-Balancing Search Tree, Graph, Skip List, Blockchain, Hash Map, Hash Set etc. you must implement the data structure yourself. It is expected that a selection of these structures will be required to meet the client requirements as efficiently as possible.
    You may use provided data structures in Java libraries (such as Linked List, Queue, Stack etc.) only if they are not a part of the content covered in this course to support the implementation of other structures and store data where necessary. Be wary of functions that are built into provided data structures, if you do use them ensure you consider their performance impact.
    You are also required to provide supporting documentation, in this, you must explain each data structure you used, what they were used for and why. This includes cases where you have used Java’s built-in data structures. Consider your implementation in the context of a real contact tracing application. The data provided for this assignment, as described below, is for 40 people, with 80 visits to 6 locations. In a real application we likely have millions of people, with tens or hundreds of millions of visits to hundreds of thousands of locations. Your implementation should be efficient for storage and processing of large amounts of data.
    Remember, it is not enough that your system implements the requirements, it must implement them efficiently.

  2. 所需资源
    您的客户端已经为您提供了一组严格的系统要求,程序必须满足这些要求。如何实现这些要求以及使用哪些算法或数据结构取决于您。您必须使用OpenJDK 11或更高版本在Java中实现该程序。你还应该致力于使程序尽可能高效。例如,在嵌套循环中彻底搜索列表在许多情况下不是最有效的实现。
    一般来说,在设计时考虑优化是比较容易的。当使用以下数据结构:二叉搜索树,自平衡搜索树,图,跳表,区块链,哈希映射,哈希集等等您必须自己实现数据结构。预计将需要对这些结构进行选择,以尽可能有效地满足客户的要求。
    您可以使用Java库中提供的数据结构(如链表、队列、堆栈等)。仅当它们不是本课程涉及的内容的一部分时,才支持其他结构的实现,并在必要的地方存储数据。对于内置在所提供的数据结构中的函数要保持警惕,如果要使用它们,请确保考虑到它们对性能的影响。
    你还需要提供支持文件,在这方面,你必须解释你所使用的每一个数据结构,他们被用来做什么,为什么。这包括使用Java内置数据结构的情况。在实际接触者追踪应用程序的上下文中考虑您的实现。如下所述,为这项任务提供的数据涉及40人,对6个地点进行了80次访问。在一个真实的应用程序中,我们可能有数以百万计的用户,对数十万个位置进行了数千万或数亿次的访问。您的实现应该能够高效地存储和处理大量数据。
    记住,你的系统实现需求是不够的,它必须高效地实现这些需求。

2.1 System Requirements
Below are a set of requirements for the operation of the program as provided by your client.
• The system administrator would like the ability to load existing data from the provided .csv files. The code to read the files is already provided by the client however they have not implemented a method
to store the data.
• In addition, public health officials need the ability to add a new Person, Location or Visit to the data.
The client has provided the input command parsing code to support this however they have not
implemented the functionality.
• Public health officials need the ability to search for a Person by name. This should show them all details
about the person. This includes listing all visits the Person has made.
o Hint: This would require an efficient means of searching for the Person and all Visits in which
the Person has visited any Location.
o If a startDate and endDate are provided, this should also filter the list of Visits to only include
those between these times.
• Public health officials need the ability to search for a Location by name. This should show them all
details about the location. This includes listing all people that have visited the location.
o If a startDate and endDate are provided, this should also filter the list of Visits to only include
those between these times.
• The public health officials would like the ability to produce a list of potential contacts up to (n) levels
away from a given person (including known contacts).
o If n = 1, the list will contain only direct contacts of the given person.
o If n = 2, the list will contain all direct contacts (n=1) of the given person and all contacts of
those contacts (n=2).
o If n=r, the list will contain all n=1 to n=r-1 contacts of the given person and all contacts of those
contacts (n=r).
o Hint: This would require an efficient method of identifying contacts of a given person based
on their visits.
• Public health officials also need the ability to specify if the person is a new Active Case (i.e., they have
become infected with the virus).
o When an Active Case is added, they also need to see an estimation of where, when and from
whom the person likely contracted the virus. Your program should output the most likely contact source including the location and time of contact. Note: The most likely contact source is the pair of people with the highest Chance of Spread © as defined later in this document.
o If a new Active Case has no immediate contacts that are also an Active Case, the program should instead find the nearest or most likely Active Case. That is, the existing Active Case for which each contact between them and the new Active Case have the highest total Chance of Spread ©.
o Hint: This would require a method for identifying the person from which the visit during which the person most likely contracted the virus.
• The public health officials would like to output a trace of the transmission of the virus from its original source to a target person. In this process this trace should ensure the date each person along the path was infected is correct by verifying the start date of their infection is the day after the contact with the highest Chance of Spread ©. In a ‘real world’ data set this would be useful for identifying different branches of the virus as it spreads and tracing the virus back to its original source.
o Hint: this would require a method for tracing the path through each person backwards from the given person until no previous source case can be found (in the provided data).
• The public health officials would like to be able to produce a list of all active cases.
• The program must be robust and user friendly, so it does not crash but print proper messages.
3
2.2 Supporting Documentation
You must provide a document to support your program design and demonstrate your program meets the requirements. This must include:
• One-page summary of your program design and the reasoning behind your design decision.
Explain all data structures and algorithms you used, what they were used for, and your reasoning for selecting them. (e.g., estimate of overall performance, space and time-efficiency)
• Sample outputs from your program. (no page limit)
This is to demonstrate that your program meets the requirements. Provide headings to clarify what requirement does the provided sample output demonstrates.
2.1 系统要求
以下是您的客户提供的一组程序操作要求。
• 系统管理员希望能够从提供的.csv文件加载现有数据。客户端已经提供了读取文件的代码,但是它们尚未实现方法
以存储数据。
• 此外,公共卫生官员需要能够在数据中添加新的人员、位置或访问。
客户端提供了输入命令解析代码来支持此功能,但是他们没有
实现了该功能。
• 公共卫生官员需要能够按姓名搜索一个人。这应该向他们显示所有详细信息
关于这个人。这包括列出此人进行的所有访问。
o 提示:这需要一种有效的方法来搜索该人和所有访问,其中
该人访问过任何地点。
o 如果提供了 startDate 和 endDate,则还应过滤访问列表以仅包括
这些时间之间的那些。
• 公共卫生官员需要能够按名称搜索位置。这应该向他们展示所有内容
有关位置的详细信息。这包括列出所有访问过该地点的人。
o 如果提供了 startDate 和 endDate,则还应过滤访问列表以仅包括
这些时间之间的那些。
• 公共卫生官员希望能够编制一份最高(n)级的潜在接触者名单
远离特定人员(包括已知联系人)。
o 如果 n = 1,则列表将仅包含给定人员的直接联系人。
o 如果 n = 2,则列表将包含给定人员的所有直接联系人 (n=1) 和
这些触点 (n=2)。
o 如果 n=r,则列表将包含给定人员的所有 n=1 到 n=r-1 联系人以及这些联系人的所有联系人
触点 (n=r)。
o 提示:这需要一种有效的方法来识别给定人员的联系人
在他们的访问中。
• 公共卫生官员还需要能够具体说明该人是否是新的活动病例(即他们有
感染病毒)。
o 添加活动案例时,他们还需要查看对地点、时间和来源的估计
该人可能感染了病毒。您的程序应输出最可能的联系人来源,包括联系人的位置和时间。注意:最有可能的接触源是本文档后面定义的传播几率 (C) 最高的一对人。
o 如果新的活动案例没有同时也是活动案例的直接联系人,则程序应查找最近或最有可能的活动案例。也就是说,它们与新的活动案例之间的每个接触都具有最高的总传播机会 (C) 的现有活动案例。
o 提示:这需要一种方法来识别最有可能感染病毒的人。
• 公共卫生官员希望输出病毒从原始来源传播给目标人的痕迹。在此过程中,此跟踪应通过验证其感染的开始日期是与传播机会最高 (C) 接触后的第二天,确保沿途每个人被感染的日期是正确的。在“真实世界”数据集中,这将有助于识别病毒传播时的不同分支,并将病毒追溯到其原始来源。
o 提示:这需要一种方法,从给定的人向后跟踪每个人的路径,直到找不到以前的源案例(在提供的数据中)。
• 公共卫生官员希望能够编制一份所有活跃病例的清单。
•该程序必须健壮且用户友好,因此它不会崩溃,但会打印正确的消息。
3
2.2 支持文档
您必须提供一份文档来支持您的程序设计并证明您的程序符合要求。这必须包括:
• 一页纸的程序设计摘要和设计决策背后的原因。
解释您使用的所有数据结构和算法、它们的用途以及选择它们的原因。(例如,对整体性能、空间和时间效率的估计)
• 程序的示例输出。(无页数限制)
这是为了证明您的程序符合要求。提供标题以阐明提供的示例输出所展示的要求。

3.2 Provided Code
The client has provided the basic interface commands they wish to use to handle the data. You are free to add commands for your testing purposes if you wish, however you must keep the commands listed here the same. The provided base code handles the parsing of these commands and provides some supporting types and functions. It is recommended that you retain the command functionality and build upon it, however you are free to modify the base code however you want/need to meet the requirements. See testfiles/test.txt in the provided code for a set of example commands.
The program is configured with an artificial “CURRENT_DATE” variable that relates to the provided data files. You should use whenever referring to the current date. This is configured by an initialization command in the test files.
For simplicity, a limited data set is provided. A person is only considered infectious if they are currently an active case and only the dates between which they are infections is recorded. All of the data is artificial data that has been procedurally generated. A person is only considered an active case if they have an activeStartDate, and they either don’t have an activeEndDate or the activeEndDate is after the “current date”.
3.2 提供的代码
客户端提供了他们希望使用的基本接口命令来处理数据。如果您愿意,您可以自由添加命令以进行测试,但是您必须保持此处列出的命令相同。提供的基线代码处理这些命令的解析并提供一些支持类型和函数。建议您保留命令功能并在此基础上进行构建,但是您可以自由修改基线代码以满足要求。在提供的代码中,请参见testfiles/test.txt以获取一组示例命令。
该程序配置了一个与提供的数据文件相关的人为“CURRENT_DATE”变量。每当提到当前日期时,您都应该使用它。这是由测试文件中的初始化命令配置的。
为了简便起见,我们只提供有限的数据集。只有当一个人目前是活跃病例时,才会认为他们具有传染性,并且只记录他们被感染之间的日期。所有数据都是人工生成的数据。只有当一个人有活动开始日期时,才会认为他们是活跃病例,他们要么没有活动结束日期,要么活动结束日期在“当前日期”之后。

3.3 Calculating the Chance of Spread
For this assignment, we have an imaginary virus that has a high chance of spreading and becomes detectable and contagious the following day. That is. if John is detected as an active case on 5/1/2021, they must have caught the virus some day before 5/1/2021
For this virus the chance of contact between an active case and another person resulting in a spread to that person is based on the overlap in time spent by two people at a given location, the time since the active case contracted the virus and the incubation time. The chance is the percentage of one hour spent in contact (in the same location).
Let D be the time spent by two people in the same location (in minutes) The Chance of Spread © is:
𝐶=(𝐷 ×100) 60
Note that C cannot be less than 0% or greater than 100%. 3.4 Running the Provided Code
To run the provided code you will need to pass it the path to the test file as a program argument through the “Run Configuration” in eclipse. The default included test file is at the relative location “testfiles/test.txt”. Throughout development it may help to create your own test files and data sets that you can use to help with implementation of specific functions. If you are using your own test file, make sure you update the “Arguments” under the “Run Configuration” in eclipse. Note that a different test file may be used for marking.
3.3计算传播的机会
在这次任务中,我们有一个假想的病毒,它有很高的传播几率,并在第二天变得可检测和传染。就是这样。如果约翰在2021年5月1日被检测为活动病例,他们一定是在2021年5月1日之前感染了病毒。
就这种病毒而言,活动病例与另一人的接触导致该人传播的机会取决于两人在给定地点停留的重叠时间、活动病例感染病毒后的时间和潜伏时间。机会是一个小时在接触(在同一地点)所花费的百分比。
设D为两个人在同一地点所花费的时间(以分钟为单位),则传播的几率(C)为:
𝐶=(𝐷 ×100) 60
注意,C不能小于0%,也不能大于100%。3.4运行提供的代码
要运行所提供的代码,您需要通过eclipse中的“Run Configuration”将测试文件的路径作为程序参数传递给它。默认包含的测试文件位于相对位置"testfiles/test.txt"。在整个开发过程中,创建您自己的测试文件和数据集可能会有所帮助,您可以使用这些文件和数据集来帮助实现特定的功能。如果您使用自己的测试文件,请确保更新了eclipse中“Run Configuration”下的“Arguments”。注意,可能会使用不同的测试文件进行标记。

help-assignment

  • 19
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值