《Head First Python》第五章--理解数据

先给个数据链接:Head First Python 数据集

第五章的问题是帮助Kelly教练获得每个运动员跑得最快的三个时间

数据集:





  • 首先,数据杂乱无章先进行初步处理
  1. 按逗号分隔;
  2. 把冒号、‘-’都换成‘.’,统一数据格式;
  3. 排序

# 时间格式化函数
def sanitize(time_string):
    if '-' in time_string:
        splitter = '-'
    elif ':' in time_string:
        splitter = ':'
    else:
        return(time_string)
    (mins, secs) = time_string.split(splitter)
    return(mins + '.' + secs)

# 读取文件内的时间数据
try:
    with open('james.txt') as jaf:
        data = jaf.readline()
    james = data.strip().split(',')
    with open('julie.txt') as juf:
        data = juf.readline()
    julie = data.strip().split(',')
    with open('mikey.txt') as mif:
        data = mif.readline()
    mikey = data.strip().split(',')
    with open('sarah.txt') as saf:
        data = saf.readline()
    sarah = data.strip().split(',')
except IOError as err:
    print("File Error: " + str(err))

c_james = []
c_julie = []
c_mikey = []
c_sarah = []

# 列表转换
for each_item in james:
    c_james.append(sanitize(each_item))
for each_item in julie:
    c_julie.append(sanitize(each_item))
for each_item in mikey:
    c_mikey.append(sanitize(each_item))
for each_item in sarah:
    c_sarah.append(sanitize(each_item))
# 排序输出
print(sorted(c_james))
print(sorted(c_julie))
print(sorted(c_mikey))
print(sorted(c_sarah))
  • 推导列表
  • 上面在做列表转换时,依然有点复杂,代码重复多,下面我们采用列表推导的方式进行列表转换
# 时间格式化函数
def sanitize(time_string):
    if '-' in time_string:
        splitter = '-'
    elif ':' in time_string:
        splitter = ':'
    else:
        return(time_string)
    (mins, secs) = time_string.split(splitter)
    return(mins + '.' + secs)

# 读取文件内的时间数据
try:
    with open('james.txt') as jaf:
        data = jaf.readline()
    james = data.strip().split(',')
    with open('julie.txt') as juf:
        data = juf.readline()
    julie = data.strip().split(',')
    with open('mikey.txt') as mif:
        data = mif.readline()
    mikey = data.strip().split(',')
    with open('sarah.txt') as saf:
        data = saf.readline()
    sarah = data.strip().split(',')
except IOError as err:
    print("File Error: " + str(err))

# 列表转换
c_james = [sanitize(each_item) for each_item in james]
c_julie = [sanitize(each_item) for each_item in julie]
c_mikey = [sanitize(each_item) for each_item in mikey]
c_sarah = [sanitize(each_item) for each_item in sarah]

# 排序输出
print(sorted(c_james))
print(sorted(c_julie))
print(sorted(c_mikey))
print(sorted(c_sarah))
  • 再回到最初的问题上:最快的三个时间(显然也不能重复)
  • 最快的三个时间很简单,用c_james[0:3]就可以得到,但是,这里存在着重复数据,怎样去除重复呢?
  • 一个办法是使用迭代的办法,每当插入数据时,先判断这个数据在列表中是否已存在,存在则舍弃
# 时间格式化函数
def sanitize(time_string):
    if '-' in time_string:
        splitter = '-'
    elif ':' in time_string:
        splitter = ':'
    else:
        return(time_string)
    (mins, secs) = time_string.split(splitter)
    return(mins + '.' + secs)

# 读取文件内的时间数据
try:
    with open('james.txt') as jaf:
        data = jaf.readline()
    james = data.strip().split(',')
    with open('julie.txt') as juf:
        data = juf.readline()
    julie = data.strip().split(',')
    with open('mikey.txt') as mif:
        data = mif.readline()
    mikey = data.strip().split(',')
    with open('sarah.txt') as saf:
        data = saf.readline()
    sarah = data.strip().split(',')
except IOError as err:
    print("File Error: " + str(err))

# 列表转换
james = sorted([sanitize(each_item) for each_item in james])
julie = sorted([sanitize(each_item) for each_item in julie])
mikey = sorted([sanitize(each_item) for each_item in mikey])
sarah = sorted([sanitize(each_item) for each_item in sarah])

# 排序输出
u_james = []
for t in james:
    if t not in u_james:
        u_james.append(t)
print(u_james[0:3])
u_julie = []
for t in julie:
    if t not in u_julie:
        u_julie.append(t)
print(u_julie[0:3])
u_mikey = []
for t in mikey:
    if t not in u_mikey:
        u_mikey.append(t)
print(u_mikey[0:3])
u_sarah = []
for t in sarah:
    if t not in u_sarah:
        u_sarah.append(t)
print(u_sarah[0:3])
  • 此外,Python还给我们提供了集合,集合的特点是无序、无重复
  • set() 创建一个空集合
# 时间格式化函数
def sanitize(time_string):
    if '-' in time_string:
        splitter = '-'
    elif ':' in time_string:
        splitter = ':'
    else:
        return(time_string)
    (mins, secs) = time_string.split(splitter)
    return(mins + '.' + secs)

# 读取文件内的时间数据
try:
    with open('james.txt') as jaf:
        data = jaf.readline()
    james = data.strip().split(',')
    with open('julie.txt') as juf:
        data = juf.readline()
    julie = data.strip().split(',')
    with open('mikey.txt') as mif:
        data = mif.readline()
    mikey = data.strip().split(',')
    with open('sarah.txt') as saf:
        data = saf.readline()
    sarah = data.strip().split(',')
except IOError as err:
    print("File Error: " + str(err))

# 列表转换、排序、输出
james = sorted(set([sanitize(each_item) for each_item in james]))[0:3]
print(james)
julie = sorted(set([sanitize(each_item) for each_item in julie]))[0:3]
print(julie)
mikey = sorted(set([sanitize(each_item) for each_item in mikey]))[0:3]
print(mikey)
sarah = sorted(set([sanitize(each_item) for each_item in sarah]))[0:3]
print(sarah)
  • 文件读取部分依然有许多重复,封装文件读取部分
# 时间格式化函数
def sanitize(time_string):
    if '-' in time_string:
        splitter = '-'
    elif ':' in time_string:
        splitter = ':'
    else:
        return(time_string)
    (mins, secs) = time_string.split(splitter)
    return(mins + '.' + secs)

# 对文件读取进行封装
def reader(file):
    try:
        with open(file) as datafile:
            return (datafile.readline().strip().split(','))
    except IOError as err:
        print("File Error: " + str(err))
        return (None)
# 读取文件内的时间数据
james = reader('james.txt')
julie = reader('julie.txt')
mikey = reader('mikey.txt')
sarah = reader('sarah.txt')

# 列表转换、排序、输出
james = sorted(set([sanitize(each_item) for each_item in james]))[0:3]
print(james)
julie = sorted(set([sanitize(each_item) for each_item in julie]))[0:3]
print(julie)
mikey = sorted(set([sanitize(each_item) for each_item in mikey]))[0:3]
print(mikey)
sarah = sorted(set([sanitize(each_item) for each_item in sarah]))[0:3]
print(sarah)
  • BULLET POINTS
  • 排序的两种方法,list.sort()"原地"排序,sorted(list)“复制”排序,几乎可以对任何数据结构排序。
  • 向sort()或sotred()传入reverse=True可以按降序排序。
  • 如果有一下代码
new_l = []
for t in old_l:
    new_l.append(len(t))

使用列表推导重写这个代码,可以写作:

new_l = [len(t) for t in old_l]

  • 要访问列表中多个数据,可以使用分片。例如:my_list[0:3],这会访问0,1,2三个位置的数据。

  • 使用set()工厂方法可以创建一个集合。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值