Python数据分析基础之读取CSV,JSON, XML文件

ShutingJoy

已于 2022-08-05 11:22:02 修改

阅读量578

点赞数

文章标签： python 开发语言

于 2022-08-05 11:18:53 首次发布

本文链接：https://blog.csdn.net/weixin_48255829/article/details/126172690

版权

Python数据分析-------CSV,JSON, XML

一 CSV数据

如何导入CSV数据

import csv

# 打开文件
csv_file = open(r'F:\pycharm\py_workspace\3 Python数据分析实例数据\data\who\data-text.csv')

# 让csv模块将打开的文件当csv来读取
# csv.reader()返回的是数据的列表 包含文件中的每一行数据
reader = csv.reader(csv_file)

for row in reader:
    print(row)

将输出列表改为输出字典

reader_dict = csv.DictReader(csv_file)
# 改版后返回OrderedDict类型

二 JSON数据

JSON数据格式

[
  {
    "Indicator":"Life expectancy at birth (years)",
    "PUBLISH STATES":"Published",
    "Year":1990,
    "WHO region":"Europe",
    "World Bank income group":"High-income",
    "Country":"Andorra",
    "Sex":"Both sexes",
    "Display Value":77,
    "Numeric":77.00000,
    "Low":"",
    "High":"",
    "Comments":""
  }
]

如何导入JSON数据

import json

# 打开并读取
json_data = open(r"F:\pycharm\py_workspace\3 Python数据分析实例数据\data\chp3\data-text.json").read()
# 将JSON数据载入python 输出保存在data里
data = json.loads(json_data)

# 输出的是字典
for item in data:
    print(item)

小结

CSV例子中 open函数返回一个文件对象然后csv.rader()读取
JSON例子中 open函数打开文件然后.read读取得到的是一个str字符串

csv.reader接收文件对象
json.loads接收字符串

三 XML数据

XML文件的核心结构：

import pprint
from xml.etree import ElementTree as ET

# ET类的parse方法 对传入的文件中的数据进行解析
# tree指整个XML队形
tree = ET.parse(r"F:\pycharm\py_workspace\3 Python数据分析实例数据\data\chp3\data-text.xml")
# 获取树的根元素
root = tree.getroot()
# print(root)
# <Element 'GHO' at 0x0000029B2ECA2AE8>

# 返回子元素
# pprint.pprint(list(root))
'''
[<Element 'QueryParameter' at 0x000001EC85C42C78>,
 <Element 'QueryParameter' at 0x000001EC85C42CC8>,
 <Element 'QueryParameter' at 0x000001EC85C42D18>,
 <Element 'QueryParameter' at 0x000001EC85C42DB8>,
 <Element 'QueryParameter' at 0x000001EC85C42E08>,
 <Element 'QueryParameter' at 0x000001EC85C42E58>,
 <Element 'Copyright' at 0x000001EC85C42EA8>,
 <Element 'Disclaimer' at 0x000001EC85C42F98>,
 <Element 'Metadata' at 0x000001EC85C44098>,
 <Element 'Data' at 0x000001EC85F21C78>]
'''

# 获取Data元素 用find方法
# find和findall find返回的是匹配的第一个元素/findall返回的是匹配的所有元素
data = root.find('Data')
# print(list(data))
# 打印出来的是一个由Observation元素组成的列表

# 遍历Observation元素和其每一个子元素
# for observation in data:
#     for item in observation:
#         print(item)

# 创建数据结构
# 先创建保存数据空列表
all_data = []
for observation in data:
    record = {}
    for item in observation:
        # 用Category的值作为新字典的键 item.attrib.keys()输出的是每个属性字典的键
        lookup_key = item.attrib.keys()[0]
        if lookup_key == 'Numeric':
            rec_key = 'NUMERIC'
            rec_value = item.attrib['Numeric']
        else:
            rec_key = item.attrib[lookup_key]
            rec_value = item.attrib['Code']

        record[rec_key] = rec_value

    all_data.append(record)
print(all_data)