Python Data Representation学习笔记【Week4】：Project: Analyzing Baseball Data

最新推荐文章于 2024-06-29 11:24:52 发布

XiaohanClover

最新推荐文章于 2024-06-29 11:24:52 发布

阅读量619

点赞数

分类专栏： Python Data Analysis

本文链接：https://blog.csdn.net/m0_37615469/article/details/90162800

版权

这是一篇关于Python数据分析的实战项目，作者在凌晨完成了Baseball Data的分析。项目中遇到数据存储混乱的问题，主要从info字典获取数据，并通过多重循环处理大字典。作者提到对于某个三重for循环的函数aggregate_by_player_id仍需优化，并表达了对SQL的怀念。

摘要由CSDN通过智能技术生成

凌晨1:43终于调试完毕啦~这个作业有点小复杂，数据存储有点混乱所以搞了半天qaq

核心是所有的东西都要从info这个字典里取到。先从info里取到fieldname，然后在用fieldname去读取大字典里的其他内容。

大字典也要通过先从info取csv文件，再读取成大字典。。

用到了一个三重for循环那里暂时没有想到好的方案，明天再来看看能不能优化（这个时候无比思念sql啊啊啊）

"""
Project for Week 4 of "Python Data Analysis".
Processing CSV files with baseball stastics.

Be sure to read the project description page for further information
about the expected behavior of the program.
"""

import csv


##
## Provided code from Week 3 Project
##

def read_csv_as_list_dict(filename, separator, quote):
    """
    Inputs:
      filename  - name of CSV file
      separator - character that separates fields
      quote     - character used to optionally quote fields
    Output:
      Returns a list of dictionaries where each item in the list
      corresponds to a row in the CSV file.  The dictionaries in the
      list map the field names to the field values for that row.
    """
    table = []
    with open(filename, newline='') as csvfile:
        csvreader = csv.DictReader(csvfile, delimiter=separator, quotechar=quote)
        for row in csvreader:
            table.append(row)
    return table


def read_csv_as_nested_dict(filename, keyfield, separator, quote):
    """
    Inputs:
      filename  - name of CSV file
      keyfield  - field to use as key for rows
      separator - character that separates fields
      quote     - character used to optionally quote fields
    Output:
      Returns a dictionary of dictionaries where the outer dictionary
      maps the value in the key_field to the corresponding row in the
      CSV file.  The inner dictionaries map the field names to the
      field values for that row.
    """
    table = {}
    with open(filename, newline='') as csvfile:
        csvreader = csv.DictReader(csvfile, delimiter=separator, quotechar=quote)
        for row in csvreader:
            rowid = row[keyfield]
            table[rowid] = row
    return table


##
## Provided formulas for common batting statistics
##

# Typical cutoff used for official statistics
MINIMUM_AB = 500


def batting_average(info, batting_stats):
    """
    Inputs:
      batting_stats - dictionary of batting statistics (values are strings)
    Output:
      Returns the batting average as a float
    """
    hits = float(batting_stats[info["hits"]])
    at_bats = float(batting_stats[info["atbats"]])
    if at_bats >= MINIMUM_AB:
        return hits / at_bats
    else:
        return 0


def onbase_percentage(info, batting_stats):
    """
    Inputs:
      batting_stats - dictionary of batting statistics (values are strings)
    Output:
      Returns the on-base percentage as a float
    """
    hit