最近,计算程序导出统计结果。为了验证结果,需要在800多个xlsx文件中抽取满足条件的数据进行计算得出结果,验证计算程序的结果。
程序用C#写的,已经使用了openxml读取xlsx文件。工具就不打算使用C#了,最主要程序量大麻烦。使用两种方式:VBA和Python。
一、VBA
这个非常简单,创建一个xlsm文件然后写代码执行即可。
主要代码如下:
Dim myFile As String
Dim myPath As String
Dim wb As Workbook
Dim ws As Worksheet
Dim rng As range
Dim rcng As range
Dim firstAddress As String
Dim rtn As Double
Dim rtc As Double
Dim nno As Integer
Dim nrow As Integer
Dim bexit As Boolean
rtn = 0
rtc = 0
nno = 1
nrow = 1
'为了能看到进度,注释掉了下行的代码
'Application.ScreenUpdating = False
myPath = "D:\"
myFile = Dir(myPath & "*.xlsx")
Do While myFile <> ""
Sheet1.Cells(1, 1).Value = nno
Sheet1.Cells(nrow, 2).Value = myFile
Set wb = Workbooks.Open(myPath & myFile, True)
Set ws = wb.Sheets(1)
Set rcng = ws.range("B3:B" & ws.UsedRange.Rows.Count)
With rcng
Set rng = .Find(11, SearchOrder:=xlByColumns)
bexit = False
If Not rng Is Nothing Then
firstAddress = rng.Address
Do
If rng.Next.Value2 = 22 And rng.Next.Next.Value2 = "33" Then
rtn = rtn + rng.Next.Next.Next.Value
rtc = rtc + 1
bexit = True
Exit Do
End If
If bexit Then Exit Do
Set rng = .FindNext(rng)
Loop While Not rng Is Nothing And rng.Address <> firstAddress
End If
If bexit Then
Sheet1.Cells(nrow, 3).Value = 1
Else
Sheet1.Cells(nrow, 3).Value = 0
End If
End With
wb.Close 0
'置空,否则内存占用越来越大
Set wb = Nothing
Set ws = Nothing
Set rng = Nothing
Set rcng = Nothing
nno = nno + 1
nrow = nrow + 1
myFile = Dir
Loop
If rtc > 0 Then
Sheet1.Cells(2, 1).Value = rtn / rtc
Else
Sheet1.Cells(2, 1).Value = "未找到"
End If
'Application.ScreenUpdating = True
二、Python
以前用过IronPython开发过插件,精髓没有学到。写法,还是c/c++语言。
我用的Python 3.8,不用安装默认就有这个库。没有可以安装
conda install openpyxl
import gc
import os
import sys
from openpyxl import load_workbook
cycle_num: float = 0.0
crane_count: float = 0.0
def process_bar(now, total):
print('\r正处理[{}/{}]'.format(now, total), end='')
sys.stdout.flush()
print('\n')
def read_excel(file_name):
global cycle_num
global crane_count
wb = load_workbook(file_name, read_only=True, data_only=True)
ws = wb.worksheets[0]
for row in ws.iter_rows(min_row=3, min_col=2, max_col=5, values_only=True):
if row[0] == 11 and row[1] == 22 and row[2] == '33':
cycle_num = cycle_num + float(row[3])
crane_count = crane_count + 1.0
wb.close()
del row
del ws
del wb
gc.collect()
if __name__ == '__main__':
task_curr = 1
pathname = 'D:/'
if os.path.exists(pathname):
filelist = os.listdir(pathname)
task_count = len(filelist)
for f in filelist:
process_bar(task_curr, task_count)
f = os.path.join(pathname, f)
if os.path.isfile(f):
tfs = f[-5:].lower()
if tfs == '.xlsx':
read_excel(f)
task_curr = task_curr + 1
if crane_count > 0:
avg = cycle_num / crane_count
print('avg:%.2f' % avg)
VBA问题依旧,数据量一大假死就不大好用了。不过已经很好了800多个文件执行完毕7,8分钟。每个文件最大4,50K。
Python运行1分钟以内执行完毕,可以显示进度而不影响性能。
看小红书,发现好多access等等所谓的程序员培训视频。发现和奥数类似,都是以经济利益为目标。也挺佩服这些人的营销手段,坑了很多家长。想想,程序员就那么容易干?谁都能干?想想自身还不如这些人!呜呼哀哉!!!技术码农的悲哀!!!