USYD悉尼大学DATA1002 详细作业解析Module7（全新讲解）

最新推荐文章于 2024-07-20 17:12:48 发布

不二程序猿

最新推荐文章于 2024-07-20 17:12:48 发布

阅读量1.4k

点赞数 4

分类专栏：悉尼大学 DATA1002 文章标签：程序人生恰饭经验分享 python

本文链接：https://blog.csdn.net/weixin_43773228/article/details/118340138

版权

悉尼大学 DATA1002 专栏收录该内容

7 篇文章 17 订阅

订阅专栏

Module 7: Bucketing and pivoting numeric data 1

前言
下载资料
7.0 本周知识点介绍
7.1 Pattern 1: Filtering and aggregating csv data
7.2 Pattern 2: Field-dependent aggregation
7.3 Pattern 3: Simple slice-and-dice
7.4 Pattern 4: Data binning
总结

前言

第七章以后的内容会以全新的方式进行讲解，看起来更通俗易懂，轻松的知道这章节讲什么，要学什么，代码要怎么写。Module3 到 Module6 等有时间了会进行更新和优化。
我尽可能做详细，每个步骤讲清楚。答案不止一种，有大神可以留言。其他章节在我的主页可以查看。文中空行和#注释不算讲解的代码行数，查代码行数的时候可以直接跳过。Question的代码不要直接copy，多动脑，多动手，多思考，你就是小可爱。

下载资料

数据文件链接:
climate_data_Dec2017.csv climate_data_2017.csv

7.0 本周知识点介绍

在这周的模组里将学习处理数字类型的数据，表格数据常见的存储方式是逗号分隔CSV格式。
表格数据例子：

A	B	C
12	34	56
78	90	01

CSV格式数据例子：

A , B , C
12 , 34 , 56
78 , 90 , 01

为了高效的处理数据，引用新的数据容器类型 direction，字典类似于在上一个模块中介绍的列表list，但它是无序的，将每个值value映射到指定的键key，而不是使用位置索引。在本模块中，将展示如何（以及如何不）以有效的方式使用字典来存储和处理任意类型的数据。
要学会：

过滤和聚合 CSV 数据 （Filtering and aggregating CSV data）
场相关聚合 （Field-dependent aggregation）
简单的切片 （Simple slice-and-dice）
数据分箱 (Data binning)

7.1 Pattern 1: Filtering and aggregating csv data

这小章节可以分成三个部分：

找出大风天平均最高温度的例子，聚合的方法编写程序。
用split分隔和index操作代码和跳过第一行（first line）
Question

7.1.1 小例子：聚合

这里有澳大利亚气象站的数据xxxxx，想找到大风天平均最高的温度，也就是风速超过60公里每小时的日子。发现温度记录在第6列，风速记录在第11列。下面是程序：

# 首先需要初始化一些变量方便后面的计算
temp = 0 # 设置temp变量为了聚合温度数据
count = 0 # 为了持续跟踪温度，并累计次数
is_first_line = True # 循环跳过第一行

for row in open('climate_data_Dec2017.csv'): 
    if is_first_line:
        is_first_line = False # 设置第一次循环为false，跳过第一行
    else:
        values = row.split(',') # 将每一行拆分成一个包含各个列的列表，逗号作为分隔符
        wind_speed = float(values[10]) # 在文件中提取风速的数据，在第11列就是列表的第10个。
        if wind_speed > 60: # 如果风速大于60执行下面语句
            count += 1 # 计数循环加一，为了后面求平均值用
            temp += float(values[5]) # 将符合条件的温度叠加在一起，求平均值用

mean = temp / count # 总温度除以他们的数量求得平均值
print("Average max. temperature on windy days:", mean)

-----输出结果-----
Average max. temperature on windy days: 26.1

7.1.2 分隔，索引和跳过第一行 The split method the index operator and skip the first line

The split method reviewed

链接: 菜鸟教程：split方法.

可以通过split 的方法提供分隔参数，进行数据的分隔。

row = "2017-01-01,22.7,26.6"
values = row.split(',')
print(values)

-----------输出结果----------
['2017-01-01', '22.7', '26.6']

当在row里遇到一个新的逗号时，会split形成一个新的列表元素。因此，对于每行都有逗号分隔值的表格数据，可以使用此技术提取单个值。

The index operator reviewed

列表和字符串这样的桑上局容器支持索引，是有序的有固定的位置。将元素在列表或字符串中的位置称为索引。Python 中的索引，实际上是大多数编程语言，都是从零开始的。因此，索引1实际上是指第二个元素。

row = "2017-01-02,21.2,26.3,0.4,3.4"
Values = row.split(',')
print(row[1]) # 结果是2017里面的0
print(Values[1]) # 结果是21.2

----------Output----------
0
21.2

反向递减索引方法: 索引指的是最后一个元素、倒数第二个元素、倒数第三个元素，依此类推。这种表示法非常方便，因为如果想要访问接近容器末尾的元素，需要注意不要请求超出容器范围的索引： -1-2-3

row = "2017-01-02,21.2,26.3,0.4,3.4"
Values = row.split(',')
print(Values[-1]) # 倒数第一个结果3.4
print(Values[-2]) # 倒数第二个结果0.4

----------Output----------
3.4
0.4

Skipping lines in files

在示例中使用的文件有一个标题行。该行包含对数据的描述，而不是数据本身，因此不想在该行上执行过滤和转换。为了跳过这一行，创建了一个布尔变量，告诉是否正在处理文件中的第一行：

# 设置is_first_line为True 为了在开始for循环的时，第一次进行if语句告诉is_first_line是False，就不会读取第一行了
# 从技术上来说，并没有真正的跳过第一行，只是仍在读取第一行，但是没有处理它，其余的都在else模块里了。
is_first_line = True 
for row in open("climate_data_Dec2017.csv"):
    if is_first_line:
        is_first_line = False
    else:
        values = row.split(',')
        wind_speed = float(values[10])
        if wind_speed > 65:
            print(values[0], values[2],wind_speed)

----------Output----------
2017-12-02 Wollongong 87.0
2017-12-03 Perth 70.0
2017-12-05 Wollongong 69.0

第二种方法跳过第一行： 在这里，line_count最初设置为零，然后在每次迭代结束时将其递增。该if语句仅在的 count 变量至少增加一次后才计算为真，这意味着不会if在循环的第一次迭代中进入该块。通过使用计数变量，原则上可以根据需要跳过任意数量的行 - 在文件的开头和结尾，以跳过，例如页脚行。

line_count = 0 
for row in open("climate_data_Dec2017.csv"):
    if line_count > 0:
        values = row.split(",")
        wind_speed = float(values[10])
        if wind_speed > 65:
            print(values[0] , values[2] , wind_speed)
    line_count +=1


----------Output----------
2017-12-02 Wollongong 87.0
2017-12-03 Perth 70.0
2017-12-05 Wollongong 69.0

7.1.3 Question：Summer rain 动动手动动脑

要求：

There’s nothing like a refreshing gush of rain after a sweltering hot summer day. Unfortunately, this relief from the heat isn’t a very common occurrence. For the full climate record of 2017, your task is to find the maximum amount of rainfall on days where the maximum temperature was above 35 degrees Celsius.

We have downloaded the full climate record of 2017 from Sydney weather stations and stored the data in a file called climate_data_2017.csv. The maximum temperature record is in the third column (index 2) and the rainfall is in the fourth column (index 3).

Let your program print out the result within the following sentence:

Maximum amount of rainfall on hot days: 3.8 mm
Make sure your program output matches this sample output exactly in capitalisation, punctuation and whitespaces.

要求重点： 任务是找到最高气温高于35度日子里的最大降雨量，最高温度在第三列（index2），降雨量在第四列（index3）。确保输出结果在大小写，标点和空格与实例结果一样。
实例结果：Maximum amount of rainfall on hot days: 3.8 mm

思路： 题目要求找出最大值，要用到max()函数，开始设置一个空变量用来对比出最大值。然后跳过第一行，逗号分隔，找出符合条件的温度，对比大小，通过max()找出最大值。

max_rain = 0
is_first_line = True

for row in open("climate_data_2017.csv"):
  if is_first_line:
    is_first_line = False
  else:
    values = row.split(",")
    temp = float(values[2])
    if temp > 35:
        rain = float(values[3])
        max_rain = max(max_rain, rain)

print("Maximum amount of rainfall on hot days: "+str(max_rain)+" mm")

7.2 Pattern 2: Field-dependent aggregation

小章节重点：

用Python字典来聚合数据的例子和步骤讲解。
什么是字典类型
更新，创建条目，测试keys和格式化字典输出
Question 9am humidity 动动手动动脑

7.2.1 小例子：字典方法聚合

在想要聚合按另一个字段中的值分组的某些字段的场景中，可以使用 Python 的字典，它是一种类似于列表和字符串的数据容器。

例如，假设要查找每个风向的风速记录总数；即北 (N)、北-东北 (NNE)、东北 (NE) 等。

实例代码：

wind_directions = {} # 首先创建一个空字典
is_first_line = True # 初始化一个布尔值，用于跳过第一行

for row in open("climate_data_Dec2017.csv"): # for in循环遍历文件
  if is_first_line: # 第一行为False，遍历的时候跳过第一行
    is_first_line = False
  else:
    values = row.split(",") # 拆分列表，逗号为参数，逗号分隔
    wdir = values[9] # 找出wind direction
    if wdir in wind_directions: # 使用in检查在wind_directions字典中是否已经有与当前迭代的风向wdir对应的条目
      wind_directions[wdir] += 1 # 如果有累计加 1
    else: # 如果没有 创建一个新的值为 1
      wind_directions[wdir] = 1
    
print("Wind directions by number of days:")
print(wind_directions)

----------OUTPUT----------
Wind directions by number of days:
{'NE': 8, 'E': 2, 'N': 2, 'W': 5, 'NNE': 4, 'NNW': 5, 'NW': 6, 'SSW': 13, 'S': 9, 'WNW': 10, 'SW': 4, 'SSE': 5, 'ENE': 4, 'WSW': 6, 'SE': 7, 'ESE': 2}

7.2.2 什么是字典类型

Python 中的字典由两个基本元素组成：一个键和一个对应的值。称键值对为item。每个键与其值用冒号 ( : ) 分隔，项目之间用逗号分隔。字典中的所有项目都用大括号括起来{}。

创建字典的基本语法是： dictionary = {key1: value1, key2: value2}

更详细字典介绍请参考：
链接: 菜鸟教程：Python3 字典
链接: 【Python学习笔记】第六章容器类型的数据 6.5 字典

7.2.3 Updating and creating entries & Testing for existing keys & Formatting output from dictionaries

Updating and creating entries 更新和创建条目

更新条目： 通过重新分配来更新字典中的条目：

wind_directions = {'E' : 6 , 'C' : 8 , 'S' : 9}
print('Before:' , wind_directions['E'])
wind_directions['E'] *= 3 # E对应的键值是6 ，更新后就是3 * 6
print('After:' , wind_directions['E'])

---------OUTPUT----------
Before: 6
After: 18

创建条目： 通过在方括号运算符中使用新键并为其分配值来在字典中创建新项目

wind_directions = {"E": 6, "SE": 2, "SSE": 6}
wind_directions["SW"] = 3 # 创建新条目包含原来的条目加上新加的条目
print(wind_directions)

----------OUTPUT----------
{'E': 6, 'SE': 2, 'SSE': 6, 'SW': 3}

Testing for existing keys 测试现有密钥

每当迭代聚合中使用字典时，通常会为每个步骤执行以下两件事之一：
创建一个新条目 Create a new entry
更新现有条目 Update an existing entry

为了检查给定的键是否已经存在于字典中，可以使用这样的in关键字：

wind_directions = {"S": 2, "ESE": 1, "WSW": 1}
if "S" in wind_directions:
  print("S:", wind_directions["S"])

----------OUTPUT----------
S: 2

通过检查一个给定的键是否已经存在于字典中，可以确保不会覆盖它的值，这在开始时的聚合任务中是至关重要的。这里的一般方法是创建一个新的条目如果该键不存在，如果它更新条目确实存在。

Formatting output from dictionaries 格式化字典的输出

为了以更好的方式格式化字典的内容，要做两件事：

以固定顺序打印字典项（由它们的键指定）
在一行上打印每个项目

对于第一点，需要对字典键进行排序。如果处理字符串，希望它们按字母顺序排序，如果有数字，希望它们根据它们的值进行排序。幸运的是，有一个内置的 Python 函数称为sorted，其工作方式如下：

letters = ["NE", "ENE", "NNE", "W", "E"]
numbers = [5, 2, 1, 4, 3]
print(sorted(letters))
print(sorted(numbers))

----------OUTPUT----------
['E', 'ENE', 'NE', 'NNE', 'W']
[1, 2, 3, 4, 5]

7.2.4 Question： 9am humidity 动动手动动脑

For the climate data of the first week of December 2017, your task is to find the highest value of the 9am humidity (stored in column 12, starting from zero) for each state.

Using the provided climate_data_Dec2017.csv file, create a dictionary and aggregate the humidity records as values with the states as keys.

Let your program print out all dictionary items sorted by their key and with one item per line. The output should look like this:
NSW : 100.0
NT : 70.0
QLD : 99.0
SA : 84.0
VIC : 98.0
WA : 89.0
If you’re unsure how to start this problem, take another close look at the example on the two previous slides.

要求： 您的任务是找出每个州上午 9 点湿度的最大值（存储在第 12 列），创建一个字典并将湿度记录聚合为值，并将状态作为键。

思路： 创建一个新字典存储键值对，跳过第一行，split分隔分离数据，索引humidity和state的值，用Testing for existing keys的方法进行判断（上面讲过），max的方法找出最大值，sorted排序。

代码示例：多理解，多动手，多动脑，不要copy。

states = {}
is_first_line = True

for row in open("climate_data_Dec2017.csv"):
  if is_first_line:
    is_first_line = False
  else:
    values = row.split(",")
    humidity = float(values[12])
    state = values[1]
    if state in states:
      states[state] = max(humidity, states[state])
    else:
      states[state] = humidity
    
for key in sorted(states):
  print(key, ":", states[key])

7.3 Pattern 3: Simple slice-and-dice

小章节重点：

用切片和切块的方法讲述例子
缺少或者空的条目
Question Where does the wind come from? 动动手动动脑

7.3.1 小例子：切片和切块

新的方法通过将聚合与过滤相结合来扩展之前字典聚合特定字段。

假设想找到每个州在上午 9 点（列9am relative humidity (%)）的相对湿度高于 60% 的日子里的最高日降雨量。（使用州名作为字典键）

实例代码：

# 这个例子是7.2.1例子扩展模式，首先创建一个空字典和布尔变量，跳过文件的标题
max_rain_per_state = {}
is_first_line = True

for row in open("climate_data_Dec2017.csv"): # 遍历循环打开文件
  if is_first_line:
    is_first_line = False # 跳过标题行
  else:
    values = row.split(",") # split函数进行分离
    state = values[1] # 获取州的值
    rain = float(values[6]) # 获取降雨量，记得要float转换
    humidity = float(values[12]) # 获取湿度值，记得要float转换
    if humidity > 60: # 条件判断，如果湿度大于60执行下面模块
      if state in max_rain_per_state: # 使用Testing for existing keys的方法（参考7.2.3）
         max_rain_per_state[state] = max(rain, max_rain_per_state[state])
      else:
        max_rain_per_state[state] = rain
    
print("Maximum rainfall per state on humid days:")
print(max_rain_per_state)

----------OUTPUT----------
Maximum rainfall per state on humid days:
{'NSW': 19.6, 'VIC': 4.6, 'QLD': 54.8, 'NT': 20.0, 'SA': 2.4}

7.3.2 缺少或者空的条目 Missing or empty aggregates

如果在分配字典值之前执行过滤.

很容易遗漏字典键。现在可能是在某些情况下，并不关心是否完全丢失了键，因为没有对应于该键的数据满足的过滤标准。

对现有示例的一个相当简单的修改可能是使用一个特殊值（当存在满足过滤标准的数据时，它永远不会发生）。在这个问题中，可能是一个合适的特殊值（有时称为哨兵）。为了解决这个问题，可以在的代码中改变字典赋值和过滤语句的顺序。例如： -1

代码示例：

max_rain_per_state = {}
is_first_line = True

for row in open("climate_data_Dec2017.csv"):
  if is_first_line:
    is_first_line = False
  else:
    values = row.split(",")
    state = values[1]
    rain = float(values[6])
    humidity = float(values[12])
    if state not in max_rain_per_state:
      if humidity > 60:
        max_rain_per_state[state] = rain
      else:
        max_rain_per_state[state] = -1
    else:
      if humidity > 60:     
         max_rain_per_state[state] = max(rain, max_rain_per_state[state])
      
    
print("Maximum rainfall per state on humid days:")
print(max_rain_per_state)

7.3.3 Question :Where does the wind come from? 动动脑动动手你是我的好朋友

Using the provided climate_data_Dec2017.csv file, your task is to find out how many readings there were across our recorded weather stations for each wind direction on the 26th of December.

The file contains data for more than this particular day, so you will have to use a filtering statement to make sure you only include readings from the 26th.

Aggregate the number of readings in a dictionary. If there aren’t any records for a particular wind direction for 26th of December, it should still appear in the dictionary and have the value zero, as long as there is at least one record for that particular wind direction for other days.

Let your program print out all dictionary items sorted by their key and with one item per line. The output should look like this:
E : 4
ENE : 2
ESE : 1
N : 0
NE : 1
NNE : 1
NNW : 1
NW : 0
S : 0
SE : 3
SSE : 3
SSW : 0
SW : 1
W : 0
WNW : 0
WSW : 0

题目重点和要求： 找出 12 月 26 日每个风向在记录的气象站中有多少读数。汇总字典中的阅读次数。如果 12 月 26 日没有特定风向的任何记录，它仍应出现在字典中并具有零值，只要该特定风向在其他日子至少有一条记录。

代码思路： 设置空字典和布尔值跳过第一行，split函数分隔，两个条件判断，第一个判断键值对是否在里面，如果不在则等于0，第二个判断日期，如果在则加一，sorted排序。

实例代码：

wind_directions = {}
is_first_line = True

for row in open("climate_data_Dec2017.csv"):
  if is_first_line:
    is_first_line = False
  else:
    values = row.split(",")
    wdir = values[9]
    date = values[0]
    if wdir not in wind_directions:
      wind_directions[wdir] = 0
    if date == "2017-12-26":
      wind_directions[wdir] += 1
    
for key in sorted(wind_directions):
  print(key, ":", wind_directions[key])

7.4 Pattern 4: Data binning

小章节重点：

小例子：数据分箱
列表聚合和代替聚合
Question：Temperature ranges 动动手动动脑

7.4.1 小例子：数据分箱 Data binning

数据分箱是一种常见的技术，其中数据在分箱之间进行划分，其中每个分箱包含与某个数据字段中的一系列值相对应的信息。例如，一个 bin 可能包含来自温度在 10 到 15 度之间的那些行的信息，另一个 bin 可能包含 15 到 20 度的温度等。

例子要求：对于每个月的数据，希望找到悉尼气象站在该月记录的平均最高温度。这里每个 bin 对应一个月份，也就是说，它反映了所有日期的信息，从该月的第一天到最后一天。

实例代码：

monthly_max_temps = {} # 创建新字典，值将是每个月的温度列表
is_first_line = True # 跳过第一行

for row in open("climate_data_2017.csv"):
  if is_first_line: # 跳过标题
    is_first_line = False
  else:
    values = row.split(",") # 对于每一行数据，使用split字符串方法将该行拆分为一个值列表
    city = values[2] # 城市字段
    if city == "Sydney":  # 现在所有剩余的代码都在这个if块中，所以每当遇到一个不同于悉尼的城市时，就直接跳到循环的下一次迭代
    
      date = values[0] # 对于包含悉尼数据的每一行，从值列表中提取所需的温度和日期：
# 因为将温度用于数值目的（计算平均值），所以立即将其转换为浮点数
      temp = float(values[5])
      
 # 为了汇总每个月的温度，日期格式为"yyyy-mm-dd"，因此为了提取月份，可以调用split带参数的方法"-"将日期拆分为年、月和日
      month = date.split("-")[1]

# 一旦提取了月份，会检查字典，看看是否已经有了这个月的条目
      if month not in monthly_max_temps:
        monthly_max_temps[month] = [temp] # 如果月份还没有出现在的字典中，使用当前月份作为键为它创建一个新条目，并将温度存储在一个单元素列表中
      else:
        monthly_max_temps[month].append(temp) # 通过在变量周围放置方括号，创建了一个包含单个元素的 Python 列表。一旦在字典中存储了一个列表，就可以将该月的每个新温度附加到这个列表中
      
print("Average maximum temperatures per month:")
for key in sorted(monthly_max_temps):
  temps = monthly_max_temps[key]
  print(key, ":", sum(temps)/len(temps)) # 为了计算每个月的平均温度，首先提取给定 的温度列表key，然后使用内置sum函数将所有元素相加，最后将总和除以元素数量（由 给出len） 

-----------OUTPUT----------
Average maximum temperatures per month:
06 : 18.163333333333338
07 : 19.132258064516133
08 : 19.532258064516128
09 : 23.263333333333343
10 : 23.967741935483872
11 : 23.819999999999997

7.4.2 Aggregating in lists & Alternative aggregation

Aggregating in lists：

示例有多层聚合：使用字典来存储每个月的温度，然后使用一个列表来存储属于该月的单个每日温度。

为了聚合列表中的值，首先使用遇到的第一个温度创建一个列表。一旦在一个变量周围放上一对方括号，Python 就会创建一个列表，该列表将该变量作为单个元素保存。看看这个：

temp = 27.5
temps = [temp]
print(type(temp), type(temps))

----------OUTPUT----------
<class 'float'> <class 'list'>

Alternative aggregation：

为了计算一系列值的平均值，我们通常需要知道两件事：这些值的总和以及总共有多少个值。在示例中，我们使用列表来存储所有值，然后我们使用内置函数sum并len为我们提供两个数量。

我们可以通过两种替代方式实现相同的目的：首先通过执行总和，并跟踪组合字典值中的总和和值的数量，或者使用两个单独的字典来计算总和和数量值。
仅存储值的总和和数量
对于第一种选择，我们可以使用一个双元素列表来存储总温度和值的数量：

monthly_max_temps = {}
is_first_line = True

for row in open("climate_data_2017.csv"):
  if is_first_line:
    is_first_line = False
  else:
    values = row.split(",")
    city = values[2]
    if city == "Sydney":
      date = values[0]
      temp = float(values[5])
      month = date.split("-")[1]

      if month in monthly_max_temps:
        monthly_max_temps[month][0] += temp
        monthly_max_temps[month][1] += 1
      else:
        monthly_max_temps[month] = [temp, 1]
      
print("Average maximum temperatures per month:")
for key in sorted(monthly_max_temps):
  temps = monthly_max_temps[key]
  print(key, ":", temps[0]/temps[1])

使用两本词典
我们上面建议的第二种选择是使用两个单独的字典来分别跟踪值的总和和数量。看看下面的代码：

summed_temps = {}
number_of_values = {}
is_first_line = True

for row in open("climate_data_2017.csv"):
  if is_first_line:
    is_first_line = False
  else:
    values = row.split(",")
    city = values[2]
    if city == "Sydney":
      date = values[0]
      temp = float(values[5])
      month = date.split("-")[1]

      if month in summed_temps:
        summed_temps[month] += temp
        number_of_values[month] += 1
      else:
        summed_temps[month] = temp
        number_of_values[month] = 1
      
print("Average maximum temperatures per month:")
for key in sorted(summed_temps):
  print(key, ":", summed_temps[key]/number_of_values[key])

7.4.3 Question：Temperature ranges 动动手动动脑

Using the provided climate_data_Dec2017.csv file, your task is to bin the temperature data into 5-degree ranges and find out which wheather stations, that is, which city, fell in which 5-degree range on the 25th of December 2017.

The file contains data for more than this particular day, so you will have to use a filtering statement to make sure you only include readings from the 25th.

For each city, use the maximum temperature recording from column 5 (counting from zero) and find out in which bin it belongs.

We want to use integer numbers from 0 to 7 to specify the bins. Bin 0 will aggregate cities where the were temperatures between 0 and 5, bin 1 between 5 and 10 and so on. You can calculate in which bin a given temperature belongs as follows:
temp = 23.5
print(temp // 5)
----------OUTPUT----------
4.0
To get the bin number, you need to divide the temperature by 5 in a way that ensures that a whole number is the result. This can be done using what is called “floor division” indicated with double-slashes. In this example, we get 4.0, which indicates the 20-25 degrees range.

Use a dictionary with the bin numbers as keys and aggregate the cities for each bin in individual lists. After the aggregation, print out the dictionary items sorted by their keys. The individual dictionary items (the lists) should also be sorted. Overall, the output of your program should look like this:
4.0 : ['Ballarat', 'Canberra', 'Geelong', 'Melbourne', 'Newcastle', 'Wollongong']
5.0 : ['Adelaide', 'Albury', 'Bendigo', 'Darwin']
6.0 : ['Cairns', 'Perth', 'Sunshine C', 'Toowoomba', 'Townsville']
7.0 : ['Brisbane', 'Gold Coast']
There aren’t any temperatures below 20 degrees recorded on that day, so the lowest bin index your program should find is 4.0.

In order to help with the output formatting, we have provided some skeleton code for you in the editor on the right. You can change the name of the dictionary if you want, but if you do remember to change it for the printing as well.

代码示例：（自己思考)

temp_bins = {}
is_first_line = True

for row in open("climate_data_Dec2017.csv"):
  if is_first_line:
    is_first_line = False
  else:
    values = row.split(",")
    date = values[0]
    if date == "2017-12-25":
      temp = float(values[5])
      bin_num = temp // 5
      city = values[2]
      if bin_num in temp_bins:
        temp_bins[bin_num].append(city)
      else:
        temp_bins[bin_num] = [city]
      
for key in sorted(temp_bins):
  print(key, ":", sorted(temp_bins[key]))

总结

多思考，多练习，多举一反三。
如果博客中有小细节讲错了，或者讲的不好，你觉得有更好的思路可以在评论区或者私信联系我，共同进步。

不二程序猿

关注

4
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
USYD悉尼大学DATA1002 详细作业解析Module7（全新讲解）

Module 7: Bucketing and pivoting numeric data 1前言7.0 本周知识点介绍7.1 Pattern 1: Filtering and aggregating csv data7.1.1 小例子：聚合7.1.2 分隔，索引和跳过第一行 The split method the index operator and skip the first line7.1.3 Question 动动手，动动脑前言第七章以后的内容会以全新的方式进行讲解，看起来更通俗易懂，轻松
复制链接

扫一扫