1:循环矢量化
通过以下例子可以很清楚看出矩阵的矢量化操作可以很大地提升运算速度,所以,尽量用矢量化操作来替代循环!
例子1:
tic,s=0;
for i=1:1000000
s=s+(1/2^i+1/3^i);
end
s
toc
%% 如果采用向量化的方法,则可以得出下面结果。可以看出,采取向量化的方法比常规循环运算效率要高得多。
tic,
i=1:1000000;
s=sum(1./2.^i+1./3.^i),
toc
循环次数越多,优势越明显!
例子2:
%未进行矢量化
tic;
i=0;
for n = 0:0.1:1000
i=i+1;
y(i)=cos(n);
end
toc
%采用矢量化的方法
%%
tic;
n = 0:0.1:1000;
y = cos(n);
toc
Elapsed time is 0.066274 seconds.
Elapsed time is 0.000257 seconds.
加速比为:257倍!
2:多重循环下,内重外轻
在必须使用多重循环的情况下,建议在循环的外环执行循环次数少的,内环执行循环次数多的,这样也可以显著提高程序执行速度。
例子:
A=zeros(10,20000000);
a=A(1,1);
tic
for i = 1:10
for j = 1:20000000
A(1,1)= A(1,1)+1;
end
end
toc
tic
for i = 1:20000000
for j = 1:10
A(1,1)= A(1,1)+1;
end
end
toc
Elapsed time is 1.595374 seconds.
Elapsed time is 2.210620 seconds.
感觉提升不是很大!而且只有在内外循环相差较大数量级的情况下才有效!
下面将保持整体的循环次数不变的情况下,调整内外循环的次数
A=zeros(100,2000000);
a=A(1,1);
tic
for i = 1:100
for j = 1:2000000
A(1,1)= A(1,1)+1;
end
end
toc
tic
for i = 1:2000000
for j = 1:100
A(1,1)= A(1,1)+1;
end
end
toc
运行时间:
Elapsed time is 1.583356 seconds.(与上个例子一样)
Elapsed time is 1.662330 seconds.(速度有所提升)
如下面的例子,内外循环相差只有几倍的情况:
A=zeros(1000,200000);
a=A(1,1);
tic
for i = 1:1000
for j = 1:200000
A(1,1)= A(1,1)+1;
end
end
toc
tic
for i = 1:200000
for j = 1:1000
A(1,1)= A(1,1)+1;
end
end
toc
Elapsed time is 1.582086 seconds.
Elapsed time is 1.587950 seconds.
对比上个例子,貌似是因为该例子的第二种循环速度得到提升,而第一种循环的速度保持不变。
再次修改内外循环的次数,使得两者仅仅相差2倍:
A=zeros(10000,20000);
a=A(1,1);
tic
for i = 1:10000
for j = 1:20000
A(1,1)= A(1,1)+1;
end
end
toc
tic
for i = 1:20000
for j = 1:10000
A(1,1)= A(1,1)+1;
end
end
toc
此时的运行时间:
Elapsed time is 1.581300 seconds.
Elapsed time is 1.581412 seconds.
两者基本是一致的!
3:按列访问更快
我们再来看个例子,按照上述的描述,设置内循环次数较多,是有利于提高效率的。
A=zeros(10000,20000);
tic
for i = 1:10000
for j = 1:20000
A(i,j)=A(i,j)+1;
end
end
toc
tic
for i = 1:20000
for j = 1:10000
A(j,i)=A(j,i)+1;
end
end
toc
时间分别是
Elapsed time is 5.894603 seconds.(第二种循环不是内重外轻,但是效率仍热得到提升,主要是源于列访问的优势!)
Elapsed time is 1.079242 seconds.
从中可以明显看出,列访问速度的提升!
我们再次保持内外循环次数整体不变,修改内外循环次数:
A=zeros(1000,200000);
a=A(1,1);
tic
for i = 1:1000
for j = 1:200000
A(i,j)= A(i,j)+1;
end
end
toc
tic
for i = 1:200000
for j = 1:1000
A(j,i)= A(j,i)+1;
end
end
toc
运算时间:
Elapsed time is 10.574952 seconds.(列访问的优势加大,且此时是最大值!已完全克服因为非内重外轻所引起的性能下降!)
Elapsed time is 1.144208 seconds.
列访问的优势更加明显!
A=zeros(100,2000000);
a=A(1,1);
tic
for i = 1:100
for j = 1:2000000
A(i,j)= A(i,j)+1;
end
end
toc
tic
for i = 1:2000000
for j = 1:100
A(j,i)= A(j,i)+1;
end
end
toc
运算时间:
Elapsed time is 5.715097 seconds.(相比于上面例子,第一种循环,内外循环次数差距增大,内重外轻的设计,使得性能有所提升,但是此时在性能上起主导作用的是列访问)
Elapsed time is 1.229484 seconds.(相比于上面例子,对于第二中循环,内外循环次数差距增大,非内重外轻的设计,使得性能有所下降)
继续增大内外循环次数的差距:
A=zeros(10,20000000);
a=A(1,1);
tic
for i = 1:10
for j = 1:20000000
A(i,j)= A(i,j)+1;
end
end
toc
tic
for i = 1:20000000
for j = 1:10
A(j,i)= A(j,i)+1;
end
end
toc
运算时间:
Elapsed time is 3.895749 seconds.
Elapsed time is 1.803251 seconds.
列运算的优势有所下降!
如果我们继续增大内外循环次数差距:
A=zeros(1,200000000);
a=A(1,1);
tic
for i = 1:1
for j = 1:200000000
A(i,j)= A(i,j)+1;
end
end
toc
tic
for i = 1:200000000
for j = 1:1
A(j,i)= A(j,i)+1;
end
end
toc
运算时间:
Elapsed time is 1.144015 seconds.
Elapsed time is 8.042080 seconds.
可以看出,列优势殆尽!已经被反超!
--------------------------我是分割线--------------------
在此,补充行数增长,列数减少的情况:
A=zeros(100000,2000);
a=A(1,1);
tic
for i = 1:100000
for j = 1:2000
A(i,j)= A(i,j)+1;
end
end
toc
tic
for i = 1:2000
for j = 1:100000
A(j,i)= A(j,i)+1;
end
end
toc
运行时间:
Elapsed time is 3.300342 seconds.(有3倍的速度提升,此时速度提升主要源于内重外轻的循环次数)
Elapsed time is 1.127927 seconds.
A=zeros(1000000,200);
a=A(1,1);
tic
for i = 1:1000000
for j = 1:200
A(i,j)= A(i,j)+1;
end
end
toc
tic
for i = 1:200
for j = 1:1000000
A(j,i)= A(j,i)+1;
end
end
toc
运行时间:
Elapsed time is 1.917314 seconds.(两者的差距已经在缩小!此时内重外轻起主要作用)
Elapsed time is 1.124569 seconds.
A=zeros(10000000,20);
a=A(1,1);
tic
for i = 1:10000000
for j = 1:20
A(i,j)= A(i,j)+1;
end
end
toc
tic
for i = 1:20
for j = 1:10000000
A(j,i)= A(j,i)+1;
end
end
toc
运行时间:
Elapsed time is 1.587053 seconds.(两者的差距更小!此时内重外轻起主要作用)
Elapsed time is 1.115813 seconds.
A=zeros(100000000,2);
a=A(1,1);
tic
for i = 1:100000000
for j = 1:2
A(i,j)= A(i,j)+1;
end
end
toc
tic
for i = 1:2
for j = 1:100000000
A(j,i)= A(j,i)+1;
end
end
toc
运行时间:
Elapsed time is 4.703369 seconds.(差距又开始拉大,其实由于内外循环次数相差倍数较大,所以,内重外轻在提升速度上起主要作用!)
Elapsed time is 1.119831 seconds.
总结:
1)用矢量运算代替循环,可以极大提升运算效率
2)在不可避免使用多重循环过程中,内外循环次数的设置和列优先是存在一个平衡点的。当内外循环次数相差较大,用内重外轻的设计方案可以有效地提升程序的运行效率,而此时的列访问的优势正在失去;当内外循环次数相差不大时,此时列访问的运算效率上的提升更加明显!