python 批量处理csv文件

最新推荐文章于 2024-03-03 22:14:00 发布

m0_46483236

最新推荐文章于 2024-03-03 22:14:00 发布

阅读量3.5k

点赞数 2

分类专栏： python 文章标签：大数据 python

本文链接：https://blog.csdn.net/m0_46483236/article/details/125625074

版权

python 专栏收录该内容

66 篇文章 17 订阅

订阅专栏

现遇到一个问题如下：

需要删除每一个csv文件的前12行数据，然后保存。
需要对多个目录下的csv文件进行批量处理。

代码如下：

1. 处理单个csv文件

import csv

def preprocess(path):
    '''
    对于未处理过的不符合格式的csv文件,去掉前12行,若已经处理过的符合格式则不做处理
    '''
    results = []

    with open(path, "r", encoding='utf-8',newline='') as f:
        reader = csv.reader(f)
        for row in reader:
            results.append(row)
    
    # 判断是否处理过，判断条件根据实际需要修改，目的是对已经处理过的不做重复操作    
    if results[0][0] == 'Model':
        results = results[12:]
    
    with open(path, 'w', encoding='utf-8',newline='') as p:
            writer = csv.writer(p)
            writer.writerows(results)


if __name__ == '__main__':
    #输入待处理的文件路径
    csv_path = 'xxxx.csv'
    preprocess(csv_path)

2. 批量处理（某一目录下的任意深度的所有csv文件）

import os
import csv

def preprocess(path):
    '''
    对于未处理过的不符合格式的csv文件,去掉前12行,若已经处理过的符合格式则不做处理
    '''
    results = []

    with open(path, "r", encoding='utf-8',newline='') as f:
        reader = csv.reader(f)
        for row in reader:
            results.append(row)
    
    # 判断是否处理过，判断条件根据实际需要修改，目的是对已经处理过的不做重复操作    
    if results[0][0] == 'Model':
        results = results[12:]
    
    with open(path, 'w', encoding='utf-8',newline='') as p:
            writer = csv.writer(p)
            writer.writerows(results)


def findfile(path):
    '''
    寻找某一目录下(所有任意目录深度)的所有某种特定的文件(如csv),并对它进行某种操作(如去掉前12行数据)
    '''
    lsdir = os.listdir(path)
    files = [i for i in lsdir if os.path.isfile(os.path.join(path, i))]
    dicts = [i for i in lsdir if os.path.isdir(os.path.join(path, i))]
    if files:
        for i in files:
            if i.split('.')[-1] == 'csv':
                preprocess(os.path.join(path, i))

    if dicts:
        for i in dicts:
            findfile(os.path.join(path, i))


if __name__ == '__main__':
    # 输入要处理的文件夹根目录
    file_dir = '0-79-1'
    findfile(file_dir)

参考：

https://blog.csdn.net/m0_46483236/article/details/109583685

https://blog.csdn.net/m0_46483236/article/details/115764020