Python爬虫——csv数据存取/数据处理

最新推荐文章于 2024-09-13 09:00:00 发布

Harley_lee

最新推荐文章于 2024-09-13 09:00:00 发布

阅读量3k

点赞数 2

分类专栏： python 爬虫文章标签： python 爬虫 csv

本文链接：https://blog.csdn.net/Simon_LHM/article/details/119363513

版权

本文介绍了Python内置的csv模块以及pandas库在数据存取和处理中的应用，包括如何合并多个csv文件，数据的滤空、去重、填充，以及object类型数据的处理，如字符串和日期类型的转换。

摘要由CSDN通过智能技术生成

1. Python内置csv模块

常用模式	含义
r	只读
r+	读写
rb	二进制读
rb+	二进制读写
w	只写
w+	读写
wb	二进制写
wb+	二进制读写
a	追加
a+	读写
ab	二进制读
ab+	二进制读写

注意：1. 使用“w”模式。文件若存在，会覆盖原文件；文件若不存在，会创建新文件。

2. 使用“a”模式。默认把要写入的数据追加到文件末尾；如果文件不存在，将自动创建。

eg1. with open 方法

import csv

with  open('test.csv','w') as csvFile:
    writer = csv.writer(csvFile)
    #先写columns_name
    writer.writerow(["index","a_name","b_name"])
    #写入多行用writerows
    writer.writerows([[1,2,3],[0,1,2],[4,5,6]])
 
#用reder读取csv文件
with open('test.csv','w') as csvFile:
    reader = csv.reader(csvFile)
    for line in reader:    # 通过迭代器访问文件
        print(line)

eg2. open 方法

import csv

# 创建/打开一个csv文件
f = open('小说.csv', 'w', encoding='utf-8-sig', newline='')
# 基于文件对象构建csv写入对象
csv_write = csv.writer(f)
# 构建csv列表头
csv_write.writerow(["title", "score", "evaluator", "href"])
for i  in csv_write: