Python分析excel数据，比VBA有优势

最新推荐文章于 2024-06-12 22:31:27 发布

Pipi0714

最新推荐文章于 2024-06-12 22:31:27 发布

阅读量476

点赞数

分类专栏： vc/c++/sdk/delphi/C# 文章标签： python 开发语言

本文链接：https://blog.csdn.net/pipi0714/article/details/124904860

版权

vc/c++/sdk/delphi/C# 专栏收录该内容

141 篇文章 0 订阅

订阅专栏

最近，计算程序导出统计结果。为了验证结果，需要在800多个xlsx文件中抽取满足条件的数据进行计算得出结果，验证计算程序的结果。

程序用C#写的，已经使用了openxml读取xlsx文件。工具就不打算使用C#了，最主要程序量大麻烦。使用两种方式：VBA和Python。

一、VBA

这个非常简单，创建一个xlsm文件然后写代码执行即可。

主要代码如下：

    Dim myFile As String
    Dim myPath As String
    Dim wb As Workbook
    Dim ws As Worksheet
    Dim rng As range
    Dim rcng As range
    Dim firstAddress As String
    Dim rtn As Double
    Dim rtc As Double
    Dim nno As Integer
    Dim nrow As Integer
    Dim bexit As Boolean

    rtn = 0
    rtc = 0
    nno = 1
    nrow = 1
    '为了能看到进度，注释掉了下行的代码
    'Application.ScreenUpdating = False
    myPath = "D:\"
    myFile = Dir(myPath & "*.xlsx")
    Do While myFile <> ""
        Sheet1.Cells(1, 1).Value = nno
        Sheet1.Cells(nrow, 2).Value = myFile
        Set wb = Workbooks.Open(myPath & myFile, True)
        Set ws = wb.Sheets(1)
        Set rcng = ws.range("B3:B" & ws.UsedRange.Rows.Count)
        With rcng
            Set rng = .Find(11, SearchOrder:=xlByColumns)
            bexit = False
            If Not rng Is Nothing Then
                firstAddress = rng.Address
                Do
                    If rng.Next.Value2 = 22 And rng.Next.Next.Value2 = "33" Then
                        rtn = rtn + rng.Next.Next.Next.Value
                        rtc = rtc + 1
                        bexit = True
                        Exit Do
                    End If
                    If bexit Then Exit Do
                    Set rng = .FindNext(rng)
                Loop While Not rng Is Nothing And rng.Address <> firstAddress
            End If
            If bexit Then
                Sheet1.Cells(nrow, 3).Value = 1
            Else
                Sheet1.Cells(nrow, 3).Value = 0
            End If
        End With
        wb.Close 0
        '置空,否则内存占用越来越大
        Set wb = Nothing
        Set ws = Nothing
        Set rng = Nothing
        Set rcng = Nothing
        nno = nno + 1
        nrow = nrow + 1
        myFile = Dir
    Loop
    
    If rtc > 0 Then
        Sheet1.Cells(2, 1).Value = rtn / rtc
    Else
        Sheet1.Cells(2, 1).Value = "未找到"
    End If
    
    'Application.ScreenUpdating = True

二、Python

以前用过IronPython开发过插件，精髓没有学到。写法，还是c/c++语言。

我用的Python 3.8，不用安装默认就有这个库。没有可以安装

conda install openpyxl

import gc
import os
import sys

from openpyxl import load_workbook


cycle_num: float = 0.0
crane_count: float = 0.0


def process_bar(now, total):
    print('\r正处理[{}/{}]'.format(now, total), end='')
    sys.stdout.flush()
    print('\n')


def read_excel(file_name):
    global cycle_num
    global crane_count
    wb = load_workbook(file_name, read_only=True, data_only=True)
    ws = wb.worksheets[0]
    for row in ws.iter_rows(min_row=3, min_col=2, max_col=5, values_only=True):
        if row[0] == 11 and row[1] == 22 and row[2] == '33':
            cycle_num = cycle_num + float(row[3])
            crane_count = crane_count + 1.0

    wb.close()

    del row
    del ws
    del wb
    gc.collect()


if __name__ == '__main__':
    task_curr = 1
    pathname = 'D:/'
    if os.path.exists(pathname):
        filelist = os.listdir(pathname)
        task_count = len(filelist)
        for f in filelist:
            process_bar(task_curr, task_count)
            f = os.path.join(pathname, f)
            if os.path.isfile(f):
                tfs = f[-5:].lower()
                if tfs == '.xlsx':
                    read_excel(f)

            task_curr = task_curr + 1

        if crane_count > 0:
            avg = cycle_num / crane_count
            print('avg:%.2f' % avg)

VBA问题依旧，数据量一大假死就不大好用了。不过已经很好了800多个文件执行完毕7，8分钟。每个文件最大4，50K。

Python运行1分钟以内执行完毕，可以显示进度而不影响性能。

看小红书，发现好多access等等所谓的程序员培训视频。发现和奥数类似，都是以经济利益为目标。也挺佩服这些人的营销手段，坑了很多家长。想想，程序员就那么容易干？谁都能干？想想自身还不如这些人！呜呼哀哉！！！技术码农的悲哀！！！