USYD悉尼大学DATA1002 详细作业解析Module4

最新推荐文章于 2024-09-15 22:31:42 发布

不二程序猿

最新推荐文章于 2024-09-15 22:31:42 发布

阅读量1.3k

点赞数 6

分类专栏：悉尼大学 DATA1002 文章标签： python 大数据恰饭经验分享

本文链接：https://blog.csdn.net/weixin_43773228/article/details/108749769

版权

悉尼大学 DATA1002 专栏收录该内容

7 篇文章 18 订阅

订阅专栏

本文介绍了如何使用Python进行数据聚合操作，包括筛选高温数据、计算降雨点数、求雨天总降雨量、识别冷热天气、判断特定降雨天数以及找出最低温度中的最高值。通过实例代码展示了解决这些问题的不同方法，如利用循环、条件判断和聚合函数，为数据分析初学者提供了清晰的指导。

摘要由CSDN通过智能技术生成

# 前言

我尽可能做详细，每个步骤讲清楚。答案不止一种，有大神可以留言。其他的课程课件简介请看我的主页。
第四章节我们将引入一个新概念：聚合。通过逐点汇总信息数据，我们可以从数据中推断出统计数据，例如平均值，最大值或最小值。我们将研究如何在各种情况下使用聚合。
文中空行和#注释不算讲解的代码行数，查代码行数的时候可以直接跳过。

Question 1: Hot Data

The graph below shows the averaged maximum daily temperature in degrees Celsius in the month January from 1910 to 2016, recorded by the Sydney Observatory Hill weather station. The mean temperature over all years is indicated by the dotted line, and is about 26 degrees Celsius.

Your task is to find out how this average temperature compares to the temperatures from January 2017. The temperatures for each day are stored in a file called temperatures_Jan2017.txt. Filter these data such that you only keep temperatures larger than the mean temperature (26 degrees Celsius) by 40% or more.

To do this, you first have to calculate the relative temperature, that is t/26, where t is the temperature on each row of the file. If this ratio is larger than 1.4, print out the temperature.

The output of your program should look like this:

All these numbers are larger than 26 by 40% or more.

要求： 打开一个文本数字文件temperatures_Jan2017.txt.过滤出当前气温除以26度，然后得到一个值。这个值如果大于1.4，就把这个气温输出。

for val in open('temperatures_Jan2017.txt'):
  val_float = float(val)
  val_relative = val_float / 26
  if val_relative > 1.4:
    print(val_float)

line1: 遍历循环文件，将数据文件中的每一行分配给循环变量 val 。
line2: 获取新的变量，把文件里的数据类型转换成浮点数类型。
line3: 在获取一个新的变量，浮点数类型的数据除以26，得到一个值赋给val_relative。
line4: 条件判断，如果这个值大于1.4就输出下一行。
line5: 输出注意代码缩进，输出的值是对应line2的浮点数值。

Question 2: Count data points

As a simple example for an aggregation, your task is to count the number of data points in the provided rainfall_March2017.txt file. This file contains data for each day in March, so your program should count 31 days.

After the aggregation, your program should print out the number of data points in the file, such that the output looks like this:

Make sure your program prints exactly these words - with correct capitalisation and spacing.

If you want to test your program with fewer data points, you can edit the rainfall_March2017.txt file and remove or add some values. When we test your program we have our own file so it won’t affect the marking when you make changes to this file.

Hint
In order to count the points, you first need to initialise a variable with 0 (outside of the loop!). Then, for each value in the file, you have to increment this variable by 1. Take a look at the previous slides if you’re not sure how to get started.

要求： 打开一个文本数字文件，求出降雨量大于0的天数（也就是雨天的天数）。

#老师的方法：
count = 0
for val in open('rainfall_March2017.txt'):
  count += 1
print('Number of points:', count)
#我的方法：
s=0
for rf in open("rainfall_March2017.txt"):
  rfc=float(rf)
  if rfc >=0:
    s += 1
print("Number of points: "+ str(s))

讲老师的方法，简单明了方便，我的就比较麻烦一步一步来处理。
line1： 获得一个变量count，它的值为0，用来每次的循环计数。
题里的hint里讲过：为了计算点数，首先需要使用0（在循环外！）初始化变量。然后，对于文件中的每个值，你必须增加此变量1。

line2： 遍历循环文件，将数据文件中的每一行分配给循环变量 val 。
line3： 每次循环获取count值，并且每次循环都加1。
line4： 打印输出，要求的字符串格式，后面加上count计数循环了多少次的值。

Question 3: More rain

The bar chart below shows the daily amount of rainfall in mm for the entire month of March 2017, which was the wettest month of 2017. We have stored the data in a file called rainfall_March2017.txt.

Your task is to calculate the total amount of rainfall on wet days during this month. Using the supplied textfile, every rain value larger than 2.0 should be counted towards the total amount. Filter out values which are smaller than that.

The output of your program should look like this:

Where 320.0 is the total amount of rainfall.

To make sure your program works for the general case, you can also test it with the file rainfall_March2017_1.txt, which contains the data from the first week only. For this file, your program should print out the following:

When you submit your solution, make sure your program calls the rainfall_March2017.txt file.

要求： 您的任务是计算本月雨天的降雨总量。使用提供的文本文件，每个大于降雨值的雨水2.0都应计入总雨量。

total = 0
for val in open('rainfall_March2017.txt'):
  val_float = float(val)
  if val_float > 2.0:
    total += val_float
print('Rainfall on wet days (mm):', total)

line1: 在循环之前初始化1个变量，用于计数。
line2： 遍历循环文件，将数据文件中的每一行分配给循环变量 val 。
line3： 将循环变量的此内容转换为浮点数，以便使用它执行数字任务。
line4： 条件判断，如果数值大于2.0就进行下一项。
line5： 每循环一次就给刚开始设的total加一
line6： 打印输出
链接: +=运算符的用法

Question4: Cold and hot days

Below we have a record of the maximum temperature recorded on each day for the year 2016. We have stored these temperatures in the max_temperatures_2016.txt file.

Your task is to calculate the number of cold days and the number of hot days during that year using the following (arbitrary) thresholds:
\

Cold days have a temperature smaller than 15.0 degrees
Hot days have a temperature larger than 30.0 degrees

You will have to use two simultaneous aggregations to solve this problem. After the aggregations, your program should print out the two count values.

For the supplied max_temperatures_2016.txt file, the output of your program should look like this:

Where 27 is the number of temperatures above 30 degrees, and 7 is the number of temperatures below 15 degrees.

To make sure your program works for the general case, you can also test it with the file max_temperatures_2015.txt, which contains the maximum daily temperature for every day of 2015. For this file, your program should print out the following:

When you submit your solution, make sure your program calls the max_temperatures_2016.txt file.

要求： 找出文件里炎热和寒冷的天数，温度超过30度是炎热，低于15度是寒冷。打印输出两个值。

hot = 0
cold = 0

for val in open('max_temperatures_2016.txt'):
  val_float = float(val)
  if val_float < 15:
    cold += 1
  elif val_float > 30:
    hot +=1

print('Hot days:', hot)
print('Cold days:', cold)

line1–line2： 必须在循环之前初始化两个变量，一个变量用于计算炎热天的数量，另一个变量用于计算寒冷天的数量。
line3： 遍历循环文件，将数据文件中的每一行分配给循环变量val。
line4： 将循环变量的此内容转换为浮点数，以便使用它执行数字任务。
line5-line6： 条件判断，如果温度小于15度，初始变量cold计数加一。
line7-line8： 条件判断，如果温度大于30度，初始变量hot计数加一。
line9 & line10： 输出理想结果，第二项分别带上计数循环的hot和cold。
注意： 代码缩进，输入数字0和字母o容易混淆，细心点。

Question 5: Rainy days

Using the rainfall record during the month of January 2017, as stored in the rainfall_Jan2017.txt, your task is to determine whether or not there was a day with 20mm or more of rainfall.

If there is at least one day with 20mm or more, your program should print:

There was at least one day with 20mm or more rainfall.

if there isn’t, meaning the daily rainfall never reached 20mm, your program should print:

There was no day with 20mm or more rainfall.

For the provided 2017 data, your program should find that there is no day with 20mm or more rainfall. To make sure your program works for the general case, we also provided the rainfall data from January 2016, 2015 and 2014. For 2015 and 2016, you should find that there were days with more than 20mm, and for the 2014 data, you should find that there weren’t any.

When you submit your program, make sure it reads from the file named rainfall_Jan2017.txt

Initialisation and update rule
If you’re not sure how to start this problem, go back a few slides and take another look at how we determined the initialisation and update rule. Here you have to aggregate boolean data - take another look at the example we had as well.

要求： 您的任务是确定一天是否有20mm或更多的降雨量。有和没有会分别输出两个输出。

#方法1：
num_above20 = 0

for value in open("rainfall_Jan2017.txt"):
  value_float = float(value)
  if value_float > 20:
    num_above20 += 1

if num_above20 > 0:
  print("There was at least one day with 20mm or more rainfall.")
else:
  print("There was no day with 20mm or more rainfall.")

#方法2：
some_above20 = False
for value in open("rainfall_Jan2017.txt"):
  value_float = float(value)
  some_above20 =  some_above20 or (value_float > 20)

if some_above20:
  print("There was at least one day with 20mm or more rainfall.")
else:
  print("There was no day with 20mm or more rainfall.")

方法1用的计数循环来判断是否超过20，方法2用的逻辑关系，false、or来判断是否超过20（也可以用True、and逻辑关系判断）。主要讲解方法1，好理解。

line1： 在循环之前初始化1个变量num_above20，用于计数，后面判断计数大小。
line2： 遍历循环文件，将数据文件中的每一行分配给循环变量 value 。
line3： 将循环变量的此内容转换为浮点数，以便使用它执行数字任务。
line4-line5： 条件判断获取的浮点数是否超过20，如果超过20 ，给最开始设的变量加一。
line6-line7： 跳出上个条件判断，开始新的条件判断。如果计数循环大于0，说明在上一个条件判断有温度超过20。打印输出理想语句。
line8-line9： 其余条件，输出第二个理想语句。

Question6: Maximum minimum temperature

Below we have a record of the minimum temperature recorded on each day of the year 2016. We have stored these temperatures in a file called min_temperatures_2016.txt.

Your task is to apply an aggregation method to find the maximum temperature from this dataset and print it to the console.

Here’s what the output of your program should look like:

27.1

If you want to test your program with different datasets, you can also use the called min_temperatures_2015.txt file, which contains the daily temperature for all of 2015. For this file, your output should look like this:

23.6

When you submit your solution for marking, make sure it calls the min_temperatures_2016.txt file.

要求： 您的任务是应用聚合方法从该数据集中找到最高温度，并将其打印到控制台。这个题有两思路，方法1用了max（）函数来对比每次循环的数大小，方法2用了每次循环迭代来对比值得大小。两个方法都讲解。

#方法1
temp=0
for value in open('min_temperatures_2016.txt'):
  valuef=float(value)
  temp=max(temp,valuef)
print(temp)

line1： 每日降雨量不可能为负，因此我们可以将初始值temp选择为零（实际上，任何负数也可以）。
line2： 遍历循环文件，将数据文件中的每一行分配给循环变量 value 。
line3： 将循环变量的数变成浮点数，方便后面数字计算。
line4： 用max（）函数进行对比大小，每次运行获得一个值和浮点数进行比大小。
line5： 输出temp。
链接: max函数的用法.

max_temp = 0

for val in open('min_temperatures_2016.txt'):
  val_float = float(val)
  if val_float > max_temp:
    max_temp = val_float

print(max_temp)

line1： 在循环之前初始化1个变量max_temp用于计数。
line2： 遍历循环文件，将数据文件中的每一行分配给循环变量 val 。
line3： 将循环变量的此内容转换为浮点数，以便使用它执行数字任务。
line4-line5： 条件判断每次迭代时检查新值是否大于max_temp，直到循环到最大的值等于获取浮点数变量的值，那就是想要的结果。
line6： 打印输出