TJVSearchFiles采用的是广度遍历,递归方式实现;在大数据量状况下,相比以下链接采用queue循环性能会比较差。
TRzCustomShellTree则采用的是IShellFolder,从相关使用经验看,IShellFolder会更快,具体还未比较过。
————————————————————————————————————————————————————————
原文地址:http://www.cnblogs.com/BigTall/archive/2009/10/09/1579432.html
遍历一个目录或者磁盘中的所有内容,常用的算法有两种:深度优先和广度优先。具体实现的时候,每种算法都可以有多种实现,一般来说,有递归和非递归两种。因为工作需要,所以bigtall实现了几种算法的对比。
首先实现的是传统的深度优先的递归遍历算法,因为非递归算法和广度优先比较雷同所以没有实现。其次实现的是广度优先的递归和非递归算法,其中非递归广度算法采用一个先进先出的queue存储目录路径结果。最后实现的是基于非递归广度算法上实现的多线程搜索算法,因为现在已经是多核时代,不用多线程好像说不过去了。
测试代码附在文后。bigtall用于测试的环境是vista sp2,文件系统是ntfs,cpu intel T2300 1.66G双核,内存4G。测试所用的目录为两个,一个是含有2148个对象的d:\temp目录,另外一个是整个d盘,含54万个文件。
运行结果:
D:\work\delphi\findfirst>fastfind d:\temp 16 1
algorithm, files, total time(ms), repeat, average time(ms)
Deep walk, 2148, 46, 1, 46
Wide Recurse, 2148, 78, 1, 78
Wide Cycle, 2148, 16, 1, 16
Thread 2 walk, 2148, 31, 1, 31
Thread 4 walk, 2148, 61, 1, 61
Thread 6 walk, 2148, 93, 1, 93
Thread 8 walk, 2148, 124, 1, 124
Thread 10 walk, 2148, 156, 1, 156
Thread 12 walk, 2148, 186, 1, 186
Thread 14 walk, 2148, 218, 1, 218
Thread 16 walk, 2148, 249, 1, 249D:\work\delphi\findfirst>fastfind d:\temp 16 2
algorithm, files, total time(ms), repeat, average time(ms)
Deep walk, 2148, 93, 2, 46
Wide Recurse, 2148, 93, 2, 46
Wide Cycle, 2148, 63, 2, 31
Thread 2 walk, 2148, 61, 2, 30
Thread 4 walk, 2148, 124, 2, 62
Thread 6 walk, 2148, 186, 2, 93
Thread 8 walk, 2148, 250, 2, 125
Thread 10 walk, 2148, 311, 2, 155
Thread 12 walk, 2148, 373, 2, 186
Thread 14 walk, 2148, 437, 2, 218
Thread 16 walk, 2148, 514, 2, 257D:\work\delphi\findfirst>fastfind d:\temp 16 4
algorithm, files, total time(ms), repeat, average time(ms)
Deep walk, 2148, 171, 4, 42
Wide Recurse, 2148, 203, 4, 50
Wide Cycle, 2148, 109, 4, 27
Thread 2 walk, 2148, 139, 4, 34
Thread 4 walk, 2148, 249, 4, 62
Thread 6 walk, 2148, 373, 4, 93
Thread 8 walk, 2148, 500, 4, 125
Thread 10 walk, 2148, 623, 4, 155
Thread 12 walk, 2148, 747, 4, 186
Thread 14 walk, 2148, 874, 4, 218
Thread 16 walk, 2148, 997, 4, 249D:\work\delphi\findfirst>fastfind d:\temp 16 8
algorithm, files, total time(ms), repeat, average time(ms)
Deep walk, 2148, 264, 8, 33
Wide Recurse, 2148, 343, 8, 42
Wide Cycle, 2148, 219, 8, 27
Thread 2 walk, 2148, 248, 8, 31
Thread 4 walk, 2148, 499, 8, 62
Thread 6 walk, 2148, 747, 8, 93
Thread 8 walk, 2148, 999, 8, 124
Thread 10 walk, 2148, 1248, 8, 156
Thread 12 walk, 2148, 1496, 8, 187
Thread 14 walk, 2148, 1748, 8, 218
Thread 16 walk, 2148, 1995, 8, 249D:\work\delphi\findfirst>fastfind d:\ 16 1
algorithm, files, total time(ms), repeat, average time(ms)
Deep walk, 545519, 159447, 1, 159447
Wide Recurse, 545519, 15678, 1, 15678
Wide Cycle, 161792, 2042, 1, 2042
Thread 2 walk, 545519, 6552, 1, 6552
Thread 4 walk, 545519, 6771, 1, 6771
Thread 6 walk, 545519, 6941, 1, 6941
Thread 8 walk, 545519, 6753, 1, 6753
Thread 10 walk, 545519, 7066, 1, 7066
Thread 12 walk, 545519, 7036, 1, 7036
Thread 14 walk, 545519, 7691, 1, 7691
Thread 16 walk, 545519, 7065, 1, 7065D:\work\delphi\findfirst>fastfind d:\ 16 2
algorithm, files, total time(ms), repeat, average time(ms)
Deep walk, 545519, 21277, 2, 10638
Wide Recurse, 545519, 30201, 2, 15100
Wide Cycle, 161792, 3915, 2, 1957
Thread 2 walk, 545519, 13104, 2, 6552
Thread 4 walk, 545519, 13836, 2, 6918
Thread 6 walk, 545519, 13883, 2, 6941
Thread 8 walk, 545519, 14211, 2, 7105
Thread 10 walk, 545519, 13978, 2, 6989
Thread 12 walk, 545519, 14788, 2, 7394
Thread 14 walk, 545519, 15975, 2, 7987
Thread 16 walk, 545519, 16988, 2, 8494D:\work\delphi\findfirst>fastfind d:\ 16 4
algorithm, files, total time(ms), repeat, average time(ms)
Deep walk, 545519, 42010, 4, 10502
Wide Recurse, 545519, 64584, 4, 16146
Wide Cycle, 161792, 7814, 4, 1953
Thread 2 walk, 545519, 26317, 4, 6579
Thread 4 walk, 545519, 27205, 4, 6801
Thread 6 walk, 545519, 27487, 4, 6871
Thread 8 walk, 545519, 27877, 4, 6969
Thread 10 walk, 545519, 32994, 4, 8248
Thread 12 walk, 545519, 49841, 4, 12460
Thread 14 walk, 545519, 72072, 4, 18018
Thread 16 walk, 545519, 68047, 4, 17011D:\work\delphi\findfirst>fastfind d:\ 16 8
algorithm, files, total time(ms), repeat, average time(ms)
Deep walk, 545519, 106392, 8, 13299
Wide Recurse, 545519, 124331, 8, 15541
Wide Cycle, 161792, 15240, 8, 1905
Thread 2 walk, 545519, 53352, 8, 6669
Thread 4 walk, 545519, 64303, 8, 8037
Thread 6 walk, 545519, 132849, 8, 16606
Thread 8 walk, 545520, 156951, 8, 19618
Thread 10 walk, 545520, 67907, 8, 8488
Thread 12 walk, 545520, 59747, 8, 7468
Thread 14 walk, 545520, 62946, 8, 7868
Thread 16 walk, 545520, 102616, 8, 12827
结果分析:
- 深度优先的遍历速度和广度优先相比,占用资源最少,在早期的代码中,广度优先的速度要比深度优先的速度快,但是这里的结果却正好相反,估计是TList存取速度的问题,下次有机会应该修改一下。
- 非递归的广度优先速度非常的快,但是在之前没有用自定义的TSimpleQuery之前,速度比广度递归还要慢,说明系统的TQuery实现比较慢。而且本算法性能比任何其他算法都要高。
- 在多线程遍历中,12个线程速度最快,但是在文件数目较少的情况下,线程越多遍历速度越慢。
- 这里的算法都是通用的算法,如果针对ntfs文件系统,更快的遍历算法是读取windows内部的入口数据,但是它不算是一种性价比最高的算法。
由此我们得到结论:
- 单线程非递归的广度优先遍历算法速度最快。
要感谢 邢少网友的留言,因此我们要深入探讨一下这个测试中表现出的问题:
1. 为什么递归不如非递归快
递归算法可以简化算法的复杂度,但是会占用大量的栈(stack)空间,同时也会产生大量的函数调用代码。因为程序中使用堆(heap)和栈(stack)的方法有本质的不同,可利用的最大空间也不一样,分配栈空间的方法也比分配堆空间的方法耗费要大些。并且递归很容易引起“堆栈下溢”。这就是非递归要比递归快的原因。
2.为什么深度优先算法不如广度优先算法快
windows搜索文件的方法必定是这样:FindFirstFile,FindNextFile,FindClose的一个次序,这里不仅用到了用户变量WIN32_FIND_DATA,也用到了handle。这就意味着操作系统需要为搜索分配相应的资源,如果是深度优先的搜索,递归过程会在调用findClose之前进行,换句话说,为某个目录分配的搜索资源要在这个目录所有子目录搜索完毕之后才能释放。而广度优先的搜索则是在本次搜索的FindClose之后才开始深入搜索,其耗费的系统资源当然要比深度优先搜索要少得多了。因此,广度优先的搜索比深度优先搜索要快也就显而易见了。
3.为什么多线程不如单线程快
多线程一定要比单线程快,但是在操作IO的时候,大部分IO设备是不支持并行存取的。存取磁盘的时候,每次只允许处理一个请求。因此,用多线程来进行文件搜索,反而会造成大量的互斥操作,反而影响了速度。
附加代码如下:
算法代码
unit Traversal;
interface
uses Classes, Windows, SyncObjs, Generics.Collections;
type
TSimpleQueue = class
private
FList : array of string;
FCapacity, FRead, FWrite:Integer; // read 当前读取,write当前可写
function GetCount:Integer;
public
constructor Create();
destructor Destroy; override;
procedure Enqueue(const Value: string);
function Dequeue: string;
procedure Clear;
property Count: Integer read GetCount;
end;
TSimpleList = class
private
FList : array of string;
FCapacity, FWrite:Integer; // write当前可写
function GetCount:Integer;
public
constructor Create();
destructor Destroy; override;
procedure Add(const Value: string);
procedure Clear;
property Count: Integer read GetCount;
public type
TSimpleListEnumerator = class
private
FOwner : TSimpleList;
FIndex :Integer;
public
function MoveNext:boolean;
function GetCurrent:string;
property Current:string read GetCurrent;
end;
function GetEnumerator():TSimpleListEnumerator;
end;
TDeepFirstTraversal = class
public
class function Walk(const path:string):Integer; static;
end;
TWideFirstTraversal = class
public
class function Walk(const path:string):Integer; static;
end;
TWideCycleTraversal = class
public
class function Walk(const path:string):Integer; static;
end;
TMultiThreadTraversal = class
public type
TWalkThread = class(TThread)
private
class var WaitCount:Integer; // 队列空等待的线程数量
class var count:Integer; // 还在运行的线程数量
class var FileCount:Integer; // 返回文件个数
class var queue:TSimpleQueue;
class var critical : TCriticalSection; // 用于同步queue存取
protected
/// <summary>
/// 队列空则等待,获取成功则返回true,队列全空则false
/// </summary>
function GetWait(out value:string):boolean;
function Put(const value:string):boolean;
procedure Execute; override;
end;
public
class function Walk(const path:string; threads:Integer):Integer; static;
end;
implementation
uses SysUtils;
{ TDeepFirstTraversal }
function IsValidDir(const ffd : WIN32_FIND_DATA):Boolean;
var
special:boolean;
begin
Result := ((ffd.dwFileAttributes and FILE_ATTRIBUTE_DIRECTORY) <> 0);
if Result then begin
special := (ffd.cFileName[0] = '.') and (
(ffd.cFileName[1] = #0) or (
(ffd.cFileName[1] = '.') and (ffd.cFileName[2] = #0)
)
);
Result := not special;
end;
end;
class function TDeepFirstTraversal.Walk(const path: string):Integer;
function DoWalk(const path:string):Integer;
var
ffd : WIN32_FIND_DATA;
handle : THandle;
pattern : string;
begin
Result := 0;
pattern := path + '\*.*';
handle := Windows.FindFirstFile(PChar(pattern), ffd);
if INVALID_HANDLE_VALUE <> handle then begin
repeat
Inc(Result);
// 判断是否目录
if IsValidDir(ffd) then begin
Result := Result + DoWalk(path + '\' + ffd.cFileName);
end;
until not Windows.FindNextFile(handle, ffd);
Windows.FindClose(handle);
end;
end;
begin
Result := DoWalk(ExcludeTrailingPathDelimiter(path));
end;
{ TWideFirstTraversal }
class function TWideFirstTraversal.Walk(const path: string):Integer;
function DoWalk(const path:string):Integer;
var
ffd : WIN32_FIND_DATA;
handle : THandle;
pattern : string;
dirs: TSimpleList;
d : string;
begin
Result := 0;
dirs := TSimpleList.Create;
pattern := path + '\*.*';
handle := Windows.FindFirstFile(PChar(pattern), ffd);
if INVALID_HANDLE_VALUE <> handle then begin
repeat
Inc(Result);
// 判断是否目录
if IsValidDir(ffd) then begin
dirs.Add(path + '\' + ffd.cFileName);
end;
until not Windows.FindNextFile(handle, ffd);
Windows.FindClose(handle);
end;
for d in dirs do
Result := Result + DoWalk(d);
dirs.Clear;
FreeAndNil(dirs);
end;
begin
Result := DoWalk(ExcludeTrailingPathDelimiter(path));
end;
{ TWideCycleTraversal }
class function TWideCycleTraversal.Walk(const path: string): Integer;
function DoWalk(const path:string; const queue : TSimpleQueue):Integer;
var
ffd : WIN32_FIND_DATA;
handle : THandle;
pattern : string;
d : string;
begin
Result := 0;
pattern := path + '\*.*';
handle := Windows.FindFirstFile(PChar(pattern), ffd);
if INVALID_HANDLE_VALUE <> handle then begin
repeat
Inc(Result);
// 判断是否目录
if IsValidDir(ffd) then begin
queue.Enqueue(path + '\' + ffd.cFileName);
end;
until not Windows.FindNextFile(handle, ffd);
Windows.FindClose(handle);
end;
end;
var
queue : TSimpleQueue;
begin
queue := TSimpleQueue.Create;
Result := DoWalk(ExcludeTrailingPathDelimiter(path), queue);
while queue.Count > 0 do
Result := Result + DoWalk(queue.Dequeue, queue);
queue.Clear;
FreeAndNil(queue);
end;
{ TMultiThreadTraversal }
class function TMultiThreadTraversal.Walk(const path: string; threads:Integer):Integer;
var
t1: array of TWalkThread;
I: Integer;
begin
Result := 0;
TMultiThreadTraversal.TWalkThread.WaitCount := threads;
TMultiThreadTraversal.TWalkThread.count := 0;
TMultiThreadTraversal.TWalkThread.queue := TSimpleQueue.Create();
TMultiThreadTraversal.TWalkThread.critical := TCriticalSection.Create();
try
TMultiThreadTraversal.TWalkThread.Count := 0;
TMultiThreadTraversal.TWalkThread.FileCount := 0;
SetLength(t1, threads);
for I := 0 to threads - 1 do begin
t1[I] := TWalkThread.Create(true);
t1[i].FreeOnTerminate := false;
end;
// 加入根目录到堆栈
TMultiThreadTraversal.TWalkThread.queue.Enqueue(ExcludeTrailingPathDelimiter(path));
// 启动所有线程
for I := 0 to threads - 1 do begin
t1[i].Resume;
Sleep(1);
end;
// 等待线程结束
for I := 0 to threads - 1 do begin
t1[i].WaitFor();
t1[i].Free;
end;
finally
TMultiThreadTraversal.TWalkThread.queue.Clear;
FreeAndNil(TMultiThreadTraversal.TWalkThread.queue);
FreeAndNil(TMultiThreadTraversal.TWalkThread.critical);
end;
Result := TMultiThreadTraversal.TWalkThread.FileCount;
end;
{ TMultiThreadTraversal.TWalkThread }
procedure TMultiThreadTraversal.TWalkThread.Execute;
function DoWalk(const path:string):Integer;
var
ffd : WIN32_FIND_DATA;
handle : THandle;
pattern : string;
d : string;
begin
Result := 0;
pattern := path + '\*.*';
handle := Windows.FindFirstFile(PChar(pattern), ffd);
if INVALID_HANDLE_VALUE <> handle then begin
repeat
Inc(Result);
// 判断是否目录
if IsValidDir(ffd) then begin
Put(path + '\' + ffd.cFileName);
end;
until not Windows.FindNextFile(handle, ffd);
Windows.FindClose(handle);
end;
end;
var
path:string;
i : Integer;
begin
while GetWait(path) do begin
// 增加线程运行计数
InterlockedExchangeAdd(TMultiThreadTraversal.TWalkThread.count, 1);
I := DoWalk(path);
InterlockedExchangeAdd(TMultiThreadTraversal.TWalkThread.FileCount, i);
// 运行结束则减少计数器
InterlockedExchangeAdd(TMultiThreadTraversal.TWalkThread.count, -1);
end;
end;
function TMultiThreadTraversal.TWalkThread.GetWait(out value: string): boolean;
begin
Result := true;
InterlockedExchangeAdd(TMultiThreadTraversal.TWalkThread.WaitCount, -1);
while true do begin
while TMultiThreadTraversal.TWalkThread.queue.Count = 0 do begin
Sleep(1);
if (TMultiThreadTraversal.TWalkThread.WaitCount = 0)
and (TMultiThreadTraversal.TWalkThread.Count = 0)
then begin
//InterlockedExchangeAdd(TMultiThreadTraversal.TWalkThread.WaitCount, 1);
Exit(False);
end;
end;
if TMultiThreadTraversal.TWalkThread.critical.TryEnter then begin
try
if TMultiThreadTraversal.TWalkThread.queue.Count <> 0 then begin
value := TMultiThreadTraversal.TWalkThread.queue.Dequeue;
Exit(true);
end;
finally
InterlockedExchangeAdd(TMultiThreadTraversal.TWalkThread.WaitCount, 1);
TMultiThreadTraversal.TWalkThread.critical.Leave;
end;
end;
end;
end;
function TMultiThreadTraversal.TWalkThread.Put(const value: string): boolean;
begin
Result := true;
TMultiThreadTraversal.TWalkThread.critical.Enter;
try
TMultiThreadTraversal.TWalkThread.queue.enqueue(value);
finally
TMultiThreadTraversal.TWalkThread.critical.Leave;
end;
end;
{ TSimpleQueue }
procedure TSimpleQueue.Clear;
begin
FRead := 0;
FWrite := 0;
end;
constructor TSimpleQueue.Create;
begin
FCapacity := 1024 * 16;
SetLength(FList, FCapacity);
FRead := 0;
FWrite := 0;
end;
function TSimpleQueue.Dequeue: string;
begin
if FRead = FWrite then
raise ERangeError.Create('no element');
Result := FList[FRead];
FRead := (FRead + 1) mod FCapacity;
end;
destructor TSimpleQueue.Destroy;
begin
inherited;
end;
procedure TSimpleQueue.Enqueue(const Value: string);
var
n : Integer;
begin
n := (FWrite + 1) mod FCapacity;
if n = FRead then
raise ERangeError.Create('full');
FList[FWrite] := Value;
FWrite := n;
end;
function TSimpleQueue.GetCount: Integer;
begin
Result := FWrite - FRead;
end;
{ TSimpleList }
procedure TSimpleList.Add(const Value: string);
begin
if FWrite = FCapacity then
raise ERangeError.Create('full');
FList[FWrite] := value;
Inc(FWrite);
end;
procedure TSimpleList.Clear;
begin
FWrite := 0;
end;
constructor TSimpleList.Create;
begin
FCapacity := 32 * 1024;
SetLength(FList, FCapacity);
FWrite := 0;
end;
destructor TSimpleList.Destroy;
begin
inherited;
end;
function TSimpleList.GetCount: Integer;
begin
Result := FWrite;
end;
function TSimpleList.GetEnumerator: TSimpleListEnumerator;
begin
Result := TSimpleListEnumerator.Create;
Result.FOwner := self;
Result.FIndex := -1;
end;
{ TSimpleList.TSimpleListEnumerator }
function TSimpleList.TSimpleListEnumerator.GetCurrent: string;
begin
if (FIndex <= -1) or (FIndex >= FOwner.FWrite) then
raise ERangeError.Create('position error');
Result := FOwner.FList[FIndex];
end;
function TSimpleList.TSimpleListEnumerator.MoveNext: boolean;
begin
Inc(FIndex);
Result := FIndex < FOwner.FWrite;
end;
end.