Python处理CSV文件

Python处理CSV文件

CSV(comma-separated value,都好分隔值)文件格式是一种非常简单的数据存储与分享方式。CSV文件中的单元格中存放的只是原始文件。

Excel打开CSV文件,其格式和普通表格没有区别;

用记事本打开CSV文件,其格式如下:

data1,data2,data3
data4,data5,data6
data7,data8,data9

即,每行表示一行数据,同行数据用逗号隔开。

下面需要用到的input_file.csv文件内容如下:

Supplier Name,Invoice Number,Part Number,Cost,Purchase Date
Supplier X,001-1001,2341,$500.00 ,1/20/14
Supplier X,001-1001,2341,$500.00 ,1/20/14
Supplier X,001-1001,5467,$750.00 ,1/20/14
Supplier X,001-1001,5467,$750.00 ,1/20/14
Supplier Y,50-9501,7009,$250.00 ,1/30/14
Supplier Y,50-9501,7009,$250.00 ,1/30/14
Supplier Y,50-9505,6650,$125.00 ,2002/3/14
Supplier Name,Invoice Number,Part Number,Cost,Purchase Date,
Supplier Y,50-9505,6650,$125.00 ,2002/3/14,
Supplier Z,920-4803,3321,$615.00 ,2002/3/14,
Supplier Z,920-4804,3321,$615.00 ,2002/10/14,
Supplier Z,920-4805,3321,$615.00 ,2/17/14,
Supplier Z,920-4806,3321,$345 ,615,2/24/14


使用csv模块读写CSV文件

csv模块用途:用于正确处理数据值中嵌入逗号和其他复杂模式。

注:数据中嵌入逗号会导致数据处理出现错误。例如数据 “63,456.333” 会被错误切分成 “63” 和 “456.333”。

示例:

import csv

in_csvfile = "E:\\CVSwork\\PythonDataAnalysisFoundation\\input_file.csv"
out_csvfile = "E:\\CVSwork\\PythonDataAnalysisFoundation\\output_file.csv"
with open(in_csvfile, 'r', newline = '') as csv_in_file:
    with open(out_csvfile, 'w', newline = "") as csv_out_file:
        #创建文件读取对象和写入对象
        filereader = csv.reader(csv_in_file, delimiter = ',')
        filewriter = csv.writer(csv_out_file, delimiter = ',')
        for row_list in filereader:
            print(row_list)
            filewriter.writerow(row_list)

参数 delimiter = ',' 是默认分隔符,所以如果输入文件和输出文件都是用逗号分隔的,就不需要这个参数;

如果,读取对象中 delimiter = '+' ,意为将输入文件按 “+” 进行分隔,写入对象中 delimiter = '-' ,意为将数据用 “-” 拼接后写入输出文件。


筛选特定的行

1.行中的值满足某个条件

示例:

import csv
input_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\input_file.csv"
output_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\output_file.csv"
with open(input_file, 'r', newline = '') as csv_in_file:
    with open(output_file, 'w', newline = '') as csv_out_file:
        filereader = csv.reader(csv_in_file)
        filewritter = csv.writter(csv_out_file)
        header = next(filereader)
        filewritter.writerow(header)    #首先将列名写入输出文档
        for row_list in filereader:
        	supplier = str(row_list[0]).strip()
        	cost = str(row_list[3]).strip('$').replace(',', '')
            if suppplier == 'Supplier Z' or float(cost) > 600.0:
                filewritter.writerow(row_list)

2.行中的值属于某个集合

示例:

将时间在列表 important_dates 中的行写入输出文档。

import csv
input_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\input_file.csv"
output_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\output_file.csv"
important_dates = ['1/20/14', '1/30/14']
with open(input_file, 'r', newline = '') as csv_in_file:
    with open(output_file, 'w', newline = '') as csv_out_file:
        filereader = csv.reader(csv_in_file)
        filewritter = csv.writter(csv_out_file)
        header = next(filereader)
        filewritter.writerow(header)    #首先将列名写入输出文档
        for row_list in filereader:
            if row_list[4] in important_dates:  #第四列的内容是日期
                filewritter.writerow(row_list)

3.行中的值匹配于某个模式/正则表达式

示例:

匹配 “001-” 开头的行,写入到输出文档。

import csv
import re
input_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\input_file.csv"
output_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\output_file.csv"
#匹配“001-”开头的
pattern = re.compile(r'(?P<my_pattern_group>^001-.*)', re.I)
with open(input_file, 'r', newline = '') as csv_in_file:
    with open(output_file, 'w', newline = '') as csv_out_file:
        filereader = csv.reader(csv_in_file)
        filewriter = csv.writer(csv_out_file)
        header = next(filereader)
        filewriter.writerow(header)
        for row_list in filereader:
            invoice_number = row_list[1]
            if pattern.search(invoice_number):  #只是这了条件改变了
                filewriter.writerow(row_list)

选取特定的列

1.通过选取特定的列索引值

示例:

将第一列、第四列写入输出文档。

import csv
input_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\input_file.csv"
output_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\output_file.csv"
my_columns = [0, 3] #将第一列、第四列写入输出文档
with open(input_file, 'r', newline = '') as csv_in_file:
    with open(output_file, 'w', newline = '') as csv_out_file:
        filereader = csv.reader(csv_in_file)
        filewriter = csv.writer(csv_out_file)
        for row_list in filereader:
            row_list_output = []
            for index in my_columns:
                row_list_output.append(row_list[index])
            filewriter.writerow(row_list_output)

2.用过选取特定的列标题

示例:

将列标题为 “Cost” , “Supplier Name” 的列写入输出文档。

原则是先找出列标题对应的索引值,在按照索引值的方式进行后续操作。

import csv
input_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\input_file.csv"
output_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\output_file.csv"
my_columns_name = ['Cost', 'Supplier Name']
with open(input_file, 'r', newline = '') as csv_in_file:
    with open(output_file, 'w', newline = '') as csv_out_file:
        filereader = csv.reader(csv_in_file)
        filewriter = csv.writer(csv_out_file)
        header = next(filereader)
        my_columns = []
        #找出对应索引值
        for index in range(len(header)):
            if header[index] in my_columns_name:
                my_columns.append(index)
        #将列标题写入文档
        row_list_output = []
        for index in my_columns:
            row_list_output.append(header[index])
        filewriter.writerow(row_list_output)
        #将数据内容写入文档
        for row_list in filereader:
            row_list_output = []
            for index in my_columns:
                row_list_output.append(row_list[index])
            filewriter.writerow(row_list_output)

选取连续的行

示例:

将第4至16行写入输出文档。

import csv
input_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\input_file.csv"
output_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\output_file.csv"
row_counter = 0
with open(input_file, 'r', newline = '') as csv_in_file:
    with open(output_file, 'w', newline = '') as csv_out_file:
        filereader = csv.reader(csv_in_file)
        filewriter = csv.writer(csv_out_file)
        for row in filereader:
            if row_counter >= 3 and row_counter <= 15:
                filewriter.writerow([value.strip() for value in row])
            row_counter += 1

读取多个CSV文件

示例:

将 input_file1.csv 和 input_file2.csv 文件中的内容写入到输出文件。

import csv
import glob
import os

input_file_path = "E:\\CVSwork\\PythonDataAnalysisFoundation"
output_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\output_file.csv"
file_counter = 1
with open(output_file, 'w', newline = '') as csv_out_file:
    filewriter = csv.writer(csv_out_file)
    for input_file in glob.glob(os.path.join(input_file_path, 'input_*')):
        with open(input_file, 'r', newline = '') as csv_in_file:
            print("Operate file {}......".format(os.path.basename(input_file)))
            filereader = csv.reader(csv_in_file)
            if file_counter == 1:
                header = next(filereader, None)
                filewriter.writerow(header)
                file_counter += 1
            for row_list in filereader:
                filewriter.writerow(row_list)

其中 os.path。basename() 函数的作用是:加入 input_file = "E:\\CVSwork\\PythonDataAnalysisFoundation\\input_file.csv"os.path.basename(input_file) 的结果便是 input_file.csv ,即舍去前面的路径,返回文件的基本文件名。


参考文献:
《Python数据分析基础》

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值