1.CSV的特点:
每行文本以行为单位呈现
字段被分隔符(通常为逗号)隔开
只存储数据
不需要任何软件就可以读取
2.手动解析CSV文件
# Your task is to read the input DATAFILE line by line, and for the first 10 lines (not including the header)
# split each line on "," and then for each line, create a dictionary
# where the key is the header title of the field, and the value is the value of that field in the row.
# The function parse_file should return a list of dictionaries,
# each data line in the file being a single list entry.
# Field names and values should not contain extra whitespace, like spaces or newline characters.
# You can use the Python string method strip() to remove the extra whitespace.
# You have to parse only the first 10 data lines in this exercise,
# so the returned list should have 10 entries!
import os
DATADIR = ""
DATAFILE = "beatles-diskography.csv"
def parse_file(datafile):
data = []
with open(datafile, "r") as f:
header = f.readline().split(",") #获取表头
counter = 0
for line in f:
if counter == 10:
break
fields = line.split(",")
entry = {}
for i, value in enumerate(fields):
entry[header[i].strip()] = value.strip(); #用strip方法去除空白
data.append(entry)
couter += 1
return data
3.利用CSV模块解析CSV文件
数据中的值可能包含分隔符(逗号),影响解析。而CSV模块会自动解决这些难题。
# -*- coding: UTF-8 -*-
import os
import pprint
import csv
DATADIR = ""
DATAFILE = "beatles-diskography.csv"
def parse_csv(datafile):
data = []
n = 0
with open(datafile, "rb") as sd:
r = csv.DictReader(sd) #为每行创建一个字典,同时将字段名称与表头对应
for line in r:
data.append(line)
return data
if __name__ == '__main__':
datafile = os.path.join(DATADIR, DATAFILE)
d = parse_csv(datafile)
pprint.pprint(d)