Python网络爬虫(十一)——csv

简介

  • 逗号分隔值(Comma-Separated Values,csv),有时也称为字符分隔值,因为分隔字符也可以不是逗号
  • 逗号分隔值文件以纯文本形式存储表格数据
  • CSV 文件由任意数目的记录组成,记录间以某种换行符分隔
  • 每条记录由字段组成,字段间的分隔符是其它字符或字符串,最常见的是逗号或制表符
  • 通常,所有记录都有完全相同的字段序列
  • CSV 文件格式的通用标准并不存在,但是在 RFC 4180 中有基础性的描述
  • 使用的字符编码同样没有被指定,但是 bitASCII 是最基本的通用编码

读写文件

使用 csv 模块主要是为了读写 csv 格式的文件

reader

def reader(iterable, dialect='excel', *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__ 
    """
    csv_reader = reader(iterable [, dialect='excel']
                            [optional keyword args])
        for row in csv_reader:
            process(row)
    
    The "iterable" argument can be any object that returns a line
    of input for each iteration, such as a file object or a list.  The
    optional "dialect" parameter is discussed below.  The function
    also accepts optional keyword arguments which override settings
    provided by the dialect.
    
    The returned object is an iterator.  Each iteration returns a row
    of the CSV file (which can span multiple input lines).
    """
    pass

样本数据:

Sample data
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///
import csv

with open('csv_data.txt','r') as fp:
    data = csv.reader(fp)
    title = next(data)
    print(type(title))
    print(title)
    for i in data:
        print(i)

fp.close()

结果为:

<class 'list'>
['Sample data']
['aaa', 'bbb', 'ccc', 'ddd']
['111', '222', '333', '444']
['+++', '---', '***', '///']

从结果可以看出,使用 reader 返回的是 list。

DictReader

DictReader 是一个类:

class DictReader:
    def __init__(self, f, fieldnames=None, restkey=None, restval=None,
                 dialect="excel", *args, **kwds):
        self._fieldnames = fieldnames   # list of keys for the dict
        self.restkey = restkey          # key to catch long rows
        self.restval = restval          # default value for short rows
        self.reader = reader(f, dialect, *args, **kwds)
        self.dialect = dialect
        self.line_num = 0

样本数据

first,second,third,forth
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///
import csv

with open('csv_data.txt','r') as fp:
    data = csv.DictReader(fp)
    for i in data:
        print(i['first'],i['second'],i['third'],i['forth'])

fp.close()

结果为:

aaa bbb ccc ddd
111 222 333 444
+++ --- *** ///

从结果可以看出,使用 DictReader 可以使用字典的形式来输出数据。

writer

def writer(fileobj, dialect='excel', *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__ 
    """
    csv_writer = csv.writer(fileobj [, dialect='excel']
                                [optional keyword args])
        for row in sequence:
            csv_writer.writerow(row)
    
        [or]
    
        csv_writer = csv.writer(fileobj [, dialect='excel']
                                [optional keyword args])
        csv_writer.writerows(rows)
    
    The "fileobj" argument can be any object that supports the file API.
    """
    pass
import csv

title = ['first','second','third','forth']
value = [
    ['aaa','bbb','ccc','ddd'],
    ['111','222','333','444'],
    ['+++','---','***','///']
]

with open('csc_saved.csv','w',newline='') as fp:
    writer = csv.writer(fp)
    writer.writerow(title)
    writer.writerows(value)

fp.close()

结果为:

first,second,third,forth
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///

DictWriter

DictWriter 也是一个类:

class DictWriter:
    def __init__(self, f, fieldnames, restval="", extrasaction="raise",
                 dialect="excel", *args, **kwds):
        self.fieldnames = fieldnames    # list of keys for the dict
        self.restval = restval          # for writing short dicts
        if extrasaction.lower() not in ("raise", "ignore"):
            raise ValueError("extrasaction (%s) must be 'raise' or 'ignore'"
                             % extrasaction)
        self.extrasaction = extrasaction
        self.writer = writer(f, dialect, *args, **kwds)

同样也可以使用 DictWriter 通过字典的形式将数据写入 csv 格式的文件中。

import csv

title = ['first','second','third','forth']
value = [
    ['aaa','bbb','ccc','ddd'],
    ['111','222','333','444'],
    ['+++','---','***','///']
]

with open('csc_saved.csv','w',newline='') as fp:
    writer = csv.DictWriter(fp,title)
    writer.writerow(dict(zip(title,title)))
    for i in range(len(value)):
        item = dict(zip(title,value[i]))
        writer.writerow(item)

fp.close()

结果为:

first,second,third,forth
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值