一、功能需求。如图:
图1
图2
即,在D盘“工作”目录下,有两个excel文件(以xls结尾的),其中“残疾人列表信息01.xls”(以下简称表一)里面共有856条数据,数据格式与图2所示一样;而“残疾人列表信息02.xls”(简称表二)里面共有501条数据(图2所示)。其中表一的856条数据全部包含表二的501条数据,要实现的功能是找出剩下的(即表二不包含的)355条数据。
二、实现功能需求
通过上面描述,很明显,这样的功能不可能通过excel的基本函数来实现,必须借助excel宏,即VBA编程来实现。首先,创建一个新的excel文件,保存文件名为“残疾人列表信息03.xlsm”(如图一所示),注意,扩展名必须以xlsm结尾。
第一版实现代码:
Sub 获取未包含人员()
Dim current_path As String
Dim current_File As String
Dim open_file_01 As String, open_file_02 As String
Dim open_worksheet_01 As Worksheet, open_worksheet_02 As Worksheet
Dim current_worksheet As Worksheet
Dim rows_01 As Integer, columns_01 As Integer, i_01 As Integer, j_01 As Integer
Dim rows_02 As Integer, columns_02 As Integer, i_02 As Integer
Dim rows_03 As Integer, columns_03 As Integer, i_03 As Integer, j_03 As Integer
Dim flag As Boolean
current_File = ActiveWorkbook.Name
current_path = "d:\工作\"
open_file_01 = "残疾人列表信息01.xls"
open_file_02 = "残疾人列表信息02.xls"
Set current_worksheet = ThisWorkbook.Worksheets(1)
Workbooks.Open current_path & open_file_01
Workbooks.Open current_path & open_file_02
Set open_worksheet_01 = Workbooks(open_file_01).Worksheets(1)
Set open_worksheet_02 = Workbooks(open_file_02).Worksheets(1)
rows_01 = open_worksheet_01.UsedRange.rows.Count
columns_01 = open_worksheet_01.UsedRange.columns.Count
rows_02 = open_worksheet_02.UsedRange.rows.Count
columns_02 = open_worksheet_02.UsedRange.columns.Count
Debug.Print open_worksheet_01.UsedRange.rows.Count; open_worksheet_01.UsedRange.columns.Count
Debug.Print open_worksheet_02.UsedRange.rows.Count; open_worksheet_02.UsedRange.columns.Count
i_03 = 1
j_03 = 1
For i_01 = 3 To rows_01
flag = False
For i_02 = 4 To rows_02
If (open_worksheet_01.Cells(i_01, 5) = open_worksheet_02.Cells(i_02, 5)) Then
GoTo ignore
End If
Next i_02
flag = True
For j_01 = 1 To columns_01 + 1
current_worksheet.Cells(i_03, j_03) = open_worksheet_01.Cells(i_01, j_01)
j_03 = j_03 + 1
ignore:
Next j_01
j_03 = 1
If flag Then
i_03 = i_03 + 1
End If
Next i_01
Workbooks(open_file_01).Close SaveChanges:=False
Workbooks(open_file_02).Close SaveChanges:=False
Windows(current_File).Activate
End Sub
上面的代码效果较差,但确实能实现上述需求。这也让自己有了更多的体会,做软件写代码时,先写一个能用的,先不管它什么效率、优雅这些,不然很久开不了头,最后可能就不了了之,还没写就放弃了。
上面的代码效率很低,在我现在的电脑上,要花2.7秒左右。主要应该改进三个方面,一是表二只应该循环一次,而不是每次比较都去循环;二是不用一个一个的为单元格的赋值,而是将所有值放在二维数组里面,最后一次赋值;三是加上注释,没有注释的程序不完整。
第二版代码:
Sub 获取未包含人员()
Dim start As Double
start = Timer
Dim current_path As String
Dim current_File As String
Dim open_file_01 As String, open_file_02 As String
'分别代表残疾人列表信息01.xls、'残疾人列表信息02.xls的Sheet1
Dim open_worksheet_01 As Worksheet, open_worksheet_02 As Worksheet
'当前工作表,即残疾人列表信息03.xls的Sheet1
Dim current_worksheet As Worksheet
'表一数据行数、列数,及用于循环读取单元格数据的变量,以下类同
Dim rows_01 As Integer, columns_01 As Integer, i_01 As Integer, j_01 As Integer
Dim rows_02 As Integer, columns_02 As Integer, i_02 As Integer
Dim rows_03 As Integer, columns_03 As Integer, i_03 As Integer, j_03 As Integer
'设置一个标志,判断表二的数据在表一中是否已经存在
Dim flag As Boolean
'用于装取未包含在表二中的表一数据,即表一减表二的数据
Dim arr() As String
'将表二的所有数据取出,放在这个数组里面,用于判断表一的每条数据在此数组中是否存在
Dim arr_02() As String
current_File = ActiveWorkbook.Name
current_path = "d:\工作\"
open_file_01 = "残疾人列表信息01.xls"
open_file_02 = "残疾人列表信息02.xls"
Set current_worksheet = ThisWorkbook.Worksheets(1)
Workbooks.Open current_path & open_file_01
Workbooks.Open current_path & open_file_02
Set open_worksheet_01 = Workbooks(open_file_01).Worksheets(1)
Set open_worksheet_02 = Workbooks(open_file_02).Worksheets(1)
'表一的行数及列数
rows_01 = open_worksheet_01.UsedRange.rows.Count
columns_01 = open_worksheet_01.UsedRange.columns.Count
'表二的行数及列数
rows_02 = open_worksheet_02.UsedRange.rows.Count
columns_02 = open_worksheet_02.UsedRange.columns.Count
Debug.Print rows_01; columns_01
Debug.Print rows_02; columns_02
'前面申明了两个动态数组,现在重新定义这两个数组的大小
ReDim arr(1 To 370, 1 To 14)
ReDim arr_02(1 To rows_02)
'将表二的所有数据取出,放在这个数组里面,用于判断表一的每条数据在此数组中是否存在
For i_02 = 4 To rows_02
arr_02(i_02) = open_worksheet_02.Cells(i_02, 5)
Next i_02
i_03 = 1
j_03 = 1
For i_01 = 3 To rows_01
flag = False
If IsInArray(open_worksheet_01.Cells(i_01, 5), arr_02) Then
'如果数据存在,直接跳出循环,开始读取下一行数据
GoTo ignore
End If
flag = True
For j_01 = 1 To columns_01 + 1
'未包含的数据放在arr数组中
arr(i_03, j_01) = open_worksheet_01.Cells(i_01, j_01)
j_03 = j_03 + 1
ignore:
Next j_01
j_03 = 1
'flag为True时,表三会增加一条数据,所以行数加1
If flag Then
i_03 = i_03 + 1
End If
Next i_01
Workbooks(open_file_01).Close SaveChanges:=False
Workbooks(open_file_02).Close SaveChanges:=False
Windows(current_File).Activate
'通过Range一次性设置所有值,可以节省系统开销
Range("A1:N370").Value = arr
Debug.Print "程序运行的时间约为:" & Format(Timer - start, "0.00") & " 秒。"
End Sub
第二版代码的运行时间在0.71秒左右,提高了3倍,比较不错。值得注意的是,上面代码用到了一个自定义函数:
Public Function IsInArray(stringToBeFound As String, arr As Variant) As Boolean
IsInArray = (UBound(Filter(arr, stringToBeFound)) > -1)
End Function
它的作用是:判断字符串是否于数组中。Filter函数返回一个下标从零开始的数组,该数组包含基于指定筛选条件的一个字符串数组的子集。
上面有些数字可以先定义成变量,意思更明确一些:
Sub 获取未包含人员()
Dim start As Double
start = Timer
Dim current_path As String
Dim current_File As String
Dim open_file_01 As String, open_file_02 As String
'分别代表残疾人列表信息01.xls、'残疾人列表信息02.xls的Sheet1
Dim open_worksheet_01 As Worksheet, open_worksheet_02 As Worksheet
'当前工作表,即残疾人列表信息03.xls的Sheet1
Dim current_worksheet As Worksheet
'表一数据行数、列数,及用于循环读取单元格数据的变量,以下类同
Dim rows_01 As Integer, columns_01 As Integer, i_01 As Integer, j_01 As Integer
Dim rows_02 As Integer, columns_02 As Integer, i_02 As Integer
Dim rows_03 As Integer, columns_03 As Integer, i_03 As Integer, j_03 As Integer
'设置一个标志,判断表二的数据在表一中是否已经存在
Dim flag As Boolean
'用于装取未包含在表二中的表一数据,即表一减表二的数据
Dim arr() As String
'将表二的所有数据取出,放在这个数组里面,用于判断表一的每条数据在此数组中是否存在
Dim arr_02() As String
Dim id_card_column As Integer, table01_rows As Integer, table01_columns As Integer
'初始化各种数据
current_File = ActiveWorkbook.Name
current_path = "d:\工作\"
open_file_01 = "残疾人列表信息01_.xls"
open_file_02 = "残疾人列表信息02_.xls"
'比对的身份证在第三列
id_card_column = 3
Set current_worksheet = ThisWorkbook.Worksheets(1)
Workbooks.Open current_path & open_file_01
Workbooks.Open current_path & open_file_02
Set open_worksheet_01 = Workbooks(open_file_01).Worksheets(1)
Set open_worksheet_02 = Workbooks(open_file_02).Worksheets(1)
'表一的行数及列数
rows_01 = open_worksheet_01.UsedRange.rows.Count
columns_01 = open_worksheet_01.UsedRange.columns.Count
'表二的行数及列数
rows_02 = open_worksheet_02.UsedRange.rows.Count
columns_02 = open_worksheet_02.UsedRange.columns.Count
Debug.Print rows_01; columns_01
Debug.Print rows_02; columns_02
'前面申明了两个动态数组,现在重新定义这两个数组的大小,在原有的基础上加一点点,不然数组为溢出
ReDim arr(1 To rows_01 + 5, 1 To columns_01 + 1)
ReDim arr_02(1 To rows_02)
'将表二的所有数据取出,放在这个数组里面,用于判断表一的每条数据在此数组中是否存在,从第二行开始
For i_02 = 2 To rows_02
arr_02(i_02) = open_worksheet_02.Cells(i_02, id_card_column)
Next i_02
i_03 = 1
j_03 = 1
For i_01 = 2 To rows_01
flag = False
If IsInArray(open_worksheet_01.Cells(i_01, id_card_column), arr_02) Then
'如果数据存在,直接跳出循环,开始读取下一行数据
GoTo ignore
End If
flag = True
For j_01 = 1 To columns_01 + 1
'未包含的数据放在arr数组中
arr(i_03, j_01) = open_worksheet_01.Cells(i_01, j_01)
j_03 = j_03 + 1
ignore:
Next j_01
j_03 = 1
'flag为True时,表三会增加一条数据,所以行数加1
If flag Then
i_03 = i_03 + 1
End If
Next i_01
Workbooks(open_file_01).Close SaveChanges:=False
Workbooks(open_file_02).Close SaveChanges:=False
Windows(current_File).Activate
'通过Range一次性设置所有值,可以节省系统开销
Range("A1:N640").Value = arr
Debug.Print "程序运行的时间约为:" & Format(Timer - start, "0.00") & " 秒。"
End Sub