先给个数据链接:Head First Python 数据集
第五章的问题是帮助Kelly教练获得每个运动员跑得最快的三个时间
数据集:
- 首先,数据杂乱无章先进行初步处理
- 按逗号分隔;
- 把冒号、‘-’都换成‘.’,统一数据格式;
- 排序
# 时间格式化函数
def sanitize(time_string):
if '-' in time_string:
splitter = '-'
elif ':' in time_string:
splitter = ':'
else:
return(time_string)
(mins, secs) = time_string.split(splitter)
return(mins + '.' + secs)
# 读取文件内的时间数据
try:
with open('james.txt') as jaf:
data = jaf.readline()
james = data.strip().split(',')
with open('julie.txt') as juf:
data = juf.readline()
julie = data.strip().split(',')
with open('mikey.txt') as mif:
data = mif.readline()
mikey = data.strip().split(',')
with open('sarah.txt') as saf:
data = saf.readline()
sarah = data.strip().split(',')
except IOError as err:
print("File Error: " + str(err))
c_james = []
c_julie = []
c_mikey = []
c_sarah = []
# 列表转换
for each_item in james:
c_james.append(sanitize(each_item))
for each_item in julie:
c_julie.append(sanitize(each_item))
for each_item in mikey:
c_mikey.append(sanitize(each_item))
for each_item in sarah:
c_sarah.append(sanitize(each_item))
# 排序输出
print(sorted(c_james))
print(sorted(c_julie))
print(sorted(c_mikey))
print(sorted(c_sarah))
- 推导列表
- 上面在做列表转换时,依然有点复杂,代码重复多,下面我们采用列表推导的方式进行列表转换
# 时间格式化函数
def sanitize(time_string):
if '-' in time_string:
splitter = '-'
elif ':' in time_string:
splitter = ':'
else:
return(time_string)
(mins, secs) = time_string.split(splitter)
return(mins + '.' + secs)
# 读取文件内的时间数据
try:
with open('james.txt') as jaf:
data = jaf.readline()
james = data.strip().split(',')
with open('julie.txt') as juf:
data = juf.readline()
julie = data.strip().split(',')
with open('mikey.txt') as mif:
data = mif.readline()
mikey = data.strip().split(',')
with open('sarah.txt') as saf:
data = saf.readline()
sarah = data.strip().split(',')
except IOError as err:
print("File Error: " + str(err))
# 列表转换
c_james = [sanitize(each_item) for each_item in james]
c_julie = [sanitize(each_item) for each_item in julie]
c_mikey = [sanitize(each_item) for each_item in mikey]
c_sarah = [sanitize(each_item) for each_item in sarah]
# 排序输出
print(sorted(c_james))
print(sorted(c_julie))
print(sorted(c_mikey))
print(sorted(c_sarah))
- 再回到最初的问题上:最快的三个时间(显然也不能重复)
- 最快的三个时间很简单,用c_james[0:3]就可以得到,但是,这里存在着重复数据,怎样去除重复呢?
- 一个办法是使用迭代的办法,每当插入数据时,先判断这个数据在列表中是否已存在,存在则舍弃
# 时间格式化函数
def sanitize(time_string):
if '-' in time_string:
splitter = '-'
elif ':' in time_string:
splitter = ':'
else:
return(time_string)
(mins, secs) = time_string.split(splitter)
return(mins + '.' + secs)
# 读取文件内的时间数据
try:
with open('james.txt') as jaf:
data = jaf.readline()
james = data.strip().split(',')
with open('julie.txt') as juf:
data = juf.readline()
julie = data.strip().split(',')
with open('mikey.txt') as mif:
data = mif.readline()
mikey = data.strip().split(',')
with open('sarah.txt') as saf:
data = saf.readline()
sarah = data.strip().split(',')
except IOError as err:
print("File Error: " + str(err))
# 列表转换
james = sorted([sanitize(each_item) for each_item in james])
julie = sorted([sanitize(each_item) for each_item in julie])
mikey = sorted([sanitize(each_item) for each_item in mikey])
sarah = sorted([sanitize(each_item) for each_item in sarah])
# 排序输出
u_james = []
for t in james:
if t not in u_james:
u_james.append(t)
print(u_james[0:3])
u_julie = []
for t in julie:
if t not in u_julie:
u_julie.append(t)
print(u_julie[0:3])
u_mikey = []
for t in mikey:
if t not in u_mikey:
u_mikey.append(t)
print(u_mikey[0:3])
u_sarah = []
for t in sarah:
if t not in u_sarah:
u_sarah.append(t)
print(u_sarah[0:3])
- 此外,Python还给我们提供了集合,集合的特点是无序、无重复
- set() 创建一个空集合
# 时间格式化函数
def sanitize(time_string):
if '-' in time_string:
splitter = '-'
elif ':' in time_string:
splitter = ':'
else:
return(time_string)
(mins, secs) = time_string.split(splitter)
return(mins + '.' + secs)
# 读取文件内的时间数据
try:
with open('james.txt') as jaf:
data = jaf.readline()
james = data.strip().split(',')
with open('julie.txt') as juf:
data = juf.readline()
julie = data.strip().split(',')
with open('mikey.txt') as mif:
data = mif.readline()
mikey = data.strip().split(',')
with open('sarah.txt') as saf:
data = saf.readline()
sarah = data.strip().split(',')
except IOError as err:
print("File Error: " + str(err))
# 列表转换、排序、输出
james = sorted(set([sanitize(each_item) for each_item in james]))[0:3]
print(james)
julie = sorted(set([sanitize(each_item) for each_item in julie]))[0:3]
print(julie)
mikey = sorted(set([sanitize(each_item) for each_item in mikey]))[0:3]
print(mikey)
sarah = sorted(set([sanitize(each_item) for each_item in sarah]))[0:3]
print(sarah)
- 文件读取部分依然有许多重复,封装文件读取部分
# 时间格式化函数
def sanitize(time_string):
if '-' in time_string:
splitter = '-'
elif ':' in time_string:
splitter = ':'
else:
return(time_string)
(mins, secs) = time_string.split(splitter)
return(mins + '.' + secs)
# 对文件读取进行封装
def reader(file):
try:
with open(file) as datafile:
return (datafile.readline().strip().split(','))
except IOError as err:
print("File Error: " + str(err))
return (None)
# 读取文件内的时间数据
james = reader('james.txt')
julie = reader('julie.txt')
mikey = reader('mikey.txt')
sarah = reader('sarah.txt')
# 列表转换、排序、输出
james = sorted(set([sanitize(each_item) for each_item in james]))[0:3]
print(james)
julie = sorted(set([sanitize(each_item) for each_item in julie]))[0:3]
print(julie)
mikey = sorted(set([sanitize(each_item) for each_item in mikey]))[0:3]
print(mikey)
sarah = sorted(set([sanitize(each_item) for each_item in sarah]))[0:3]
print(sarah)
- BULLET POINTS
- 排序的两种方法,list.sort()"原地"排序,sorted(list)“复制”排序,几乎可以对任何数据结构排序。
- 向sort()或sotred()传入reverse=True可以按降序排序。
- 如果有一下代码
new_l = []
for t in old_l:
new_l.append(len(t))
使用列表推导重写这个代码,可以写作:
new_l = [len(t) for t in old_l]
- 要访问列表中多个数据,可以使用分片。例如:my_list[0:3],这会访问0,1,2三个位置的数据。
- 使用set()工厂方法可以创建一个集合。