python 批量删除excel文件的某一行（多线程）

最新推荐文章于 2024-09-11 16:56:49 发布

zhangxiangnan0906

最新推荐文章于 2024-09-11 16:56:49 发布

阅读量7.6k

点赞数 3

分类专栏：小代码文章标签： python 列表

本文链接：https://blog.csdn.net/weixin_49328057/article/details/113884554

版权

小代码专栏收录该内容

17 篇文章 0 订阅

订阅专栏

文章目录

前言
简介
代码
总结

前言

我们从网上下载的批量excel文件中，有时候这些文件的第一行是广告信息，我们可以利用python批量删除第一行，并且保存文件。由于程序涉及大量的IO操作，我们可以使用多线程进行操作。

简介

get_all_excel(path)：获得path下的所有文件，获得一个list
split_list(all_list, count) : 对list进行切分，分成count个list，获得lists。count也是进程的数量
mutil_thread(lists) : 将lists传入，在此开启多线程
change_excel(file_list, i) : 遍历list里的所有文件，更改excel文件，i 是线程的索引值。

代码

import os
import pandas as pd
import threading
import time

def get_all_excel(path):
    type = ('.xlsx')
    filelist = []

    for a, b, c in os.walk(path):
        for name in c:
            fname = os.path.join(a, name)
            if fname.endswith(type):
                filelist.append(fname)

    return filelist

def split_list(all_list, count):
    end_list = []
    n = len(all_list) // count  # 这里把一个列表切分成count个列表，在这里控制进程数
    for i in range(0, len(all_list), n):
        name = all_list[i:i + n]
        end_list.append(name)

    return end_list

def mutil_thread(lists):
    thread_list = []

    for i in range(len(lists)):
        t1 = threading.Thread(target=change_excel, args=((lists[i]), i))	#为每个线程传一个list
        thread_list.append(t1)

    for i in range(len(thread_list)):
        thread_list[i].start()

    for t in thread_list:	# 不然主线程结束，所有线程都结束
        t.join()

    print("程序结束")

def change_excel(file_list, i):
    count = 1
    fall_count = 1
    for excel in file_list:
        try:
            data = pd.read_excel(excel, header=None)
            if "XXXXX" in data.loc[？][？]:	# 如果excel表的第？行第？列包含XXXXX字符串
                new_data = data.drop(index=？)	# 去掉某一行，注意index从0开始
                new_data.to_excel(excel, float_format='%.5f', index=False, header=False)
                print("线程%d   "%i + excel + "   成功第%d次"%count + "已完成%.2f" %((count / file_list.__len__()) * 100)+"%" )
            else:
                print("线程%d   "%i + excel + "   不符合条件第%d次"%fall_count + "已完成%.2f" %((count / file_list.__len__()) * 100)+"%" )
                fall_count += 1
        except Exception as e:
            print(e)
            print(excel)
            fall_count += 1
        count += 1

if __name__ == '__main__':
    time1 = time.time()

    path = r"F:\XXXX\XXXXX"
    list = get_all_excel(path)
    lists = split_list(list, 5) # 5个进程数
    mutil_thread(lists)

    time2 = time.time()
    print(time2 - time1)