简介
- 逗号分隔值(Comma-Separated Values,csv),有时也称为字符分隔值,因为分隔字符也可以不是逗号
- 逗号分隔值文件以纯文本形式存储表格数据
- CSV 文件由任意数目的记录组成,记录间以某种换行符分隔
- 每条记录由字段组成,字段间的分隔符是其它字符或字符串,最常见的是逗号或制表符
- 通常,所有记录都有完全相同的字段序列
- CSV 文件格式的通用标准并不存在,但是在 RFC 4180 中有基础性的描述
- 使用的字符编码同样没有被指定,但是 bitASCII 是最基本的通用编码
读写文件
使用 csv 模块主要是为了读写 csv 格式的文件
reader
def reader(iterable, dialect='excel', *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__
"""
csv_reader = reader(iterable [, dialect='excel']
[optional keyword args])
for row in csv_reader:
process(row)
The "iterable" argument can be any object that returns a line
of input for each iteration, such as a file object or a list. The
optional "dialect" parameter is discussed below. The function
also accepts optional keyword arguments which override settings
provided by the dialect.
The returned object is an iterator. Each iteration returns a row
of the CSV file (which can span multiple input lines).
"""
pass
样本数据:
Sample data
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///
import csv
with open('csv_data.txt','r') as fp:
data = csv.reader(fp)
title = next(data)
print(type(title))
print(title)
for i in data:
print(i)
fp.close()
结果为:
<class 'list'>
['Sample data']
['aaa', 'bbb', 'ccc', 'ddd']
['111', '222', '333', '444']
['+++', '---', '***', '///']
从结果可以看出,使用 reader 返回的是 list。
DictReader
DictReader 是一个类:
class DictReader:
def __init__(self, f, fieldnames=None, restkey=None, restval=None,
dialect="excel", *args, **kwds):
self._fieldnames = fieldnames # list of keys for the dict
self.restkey = restkey # key to catch long rows
self.restval = restval # default value for short rows
self.reader = reader(f, dialect, *args, **kwds)
self.dialect = dialect
self.line_num = 0
样本数据
first,second,third,forth
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///
import csv
with open('csv_data.txt','r') as fp:
data = csv.DictReader(fp)
for i in data:
print(i['first'],i['second'],i['third'],i['forth'])
fp.close()
结果为:
aaa bbb ccc ddd
111 222 333 444
+++ --- *** ///
从结果可以看出,使用 DictReader 可以使用字典的形式来输出数据。
writer
def writer(fileobj, dialect='excel', *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__
"""
csv_writer = csv.writer(fileobj [, dialect='excel']
[optional keyword args])
for row in sequence:
csv_writer.writerow(row)
[or]
csv_writer = csv.writer(fileobj [, dialect='excel']
[optional keyword args])
csv_writer.writerows(rows)
The "fileobj" argument can be any object that supports the file API.
"""
pass
import csv
title = ['first','second','third','forth']
value = [
['aaa','bbb','ccc','ddd'],
['111','222','333','444'],
['+++','---','***','///']
]
with open('csc_saved.csv','w',newline='') as fp:
writer = csv.writer(fp)
writer.writerow(title)
writer.writerows(value)
fp.close()
结果为:
first,second,third,forth
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///
DictWriter
DictWriter 也是一个类:
class DictWriter:
def __init__(self, f, fieldnames, restval="", extrasaction="raise",
dialect="excel", *args, **kwds):
self.fieldnames = fieldnames # list of keys for the dict
self.restval = restval # for writing short dicts
if extrasaction.lower() not in ("raise", "ignore"):
raise ValueError("extrasaction (%s) must be 'raise' or 'ignore'"
% extrasaction)
self.extrasaction = extrasaction
self.writer = writer(f, dialect, *args, **kwds)
同样也可以使用 DictWriter 通过字典的形式将数据写入 csv 格式的文件中。
import csv
title = ['first','second','third','forth']
value = [
['aaa','bbb','ccc','ddd'],
['111','222','333','444'],
['+++','---','***','///']
]
with open('csc_saved.csv','w',newline='') as fp:
writer = csv.DictWriter(fp,title)
writer.writerow(dict(zip(title,title)))
for i in range(len(value)):
item = dict(zip(title,value[i]))
writer.writerow(item)
fp.close()
结果为:
first,second,third,forth
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///