原文:http://nghiaho.com/?p=954
This is a quick revisit to my recent post comparing 3 different libraries with matrix support. As suggested by one of the comments to the last post, I’ve turned off any debugging option that each library may have. In practice you would have them on most of the time for safety reasons, but for this test I thought it would be interesting to see it turned off.
Armadillo and Eigen uses the define ARMA_NO_DEBUG and NDEBUG respectively to turn off error checking. I could not find an immediate way to do the same thing in OpenCV, unless I edit the source code, but chose not to. So keep that in that mind. I also modified the number of iterations for each of the 5 operation performed to be slightly more accurate. Fast operations like add, multiply, transpose and invert have more iterations performed to get a better average, compared to SVD, which is quite slow.
On with the results …
Add
Performing C = A + B
Raw data
Results in ms | OpenCV | Armadillo | Eigen |
4×4 | 0.00093 | 0.00008 | 0.00007 |
8×8 | 0.00039 | 0.00006 | 0.00015 |
16×16 | 0.00066 | 0.00030 | 0.00059 |
32×32 | 0.00139 | 0.00148 | 0.00194 |
64×64 | 0.00654 | 0.00619 | 0.00712 |
128×128 | 0.02454 | 0.02738 | 0.03225 |
256×256 | 0.09144 | 0.11315 | 0.10920 |
512×512 | 0.47997 | 0.57668 | 0.47382 |
Normalised
Speed up over slowest | OpenCV | Armadillo | Eigen |
4×4 | 1.00x | 12.12x | 14.35x |
8×8 | 1.00x | 6.53x | 2.63x |
16×16 | 1.00x | 2.19x | 1.13x |
32×32 | 1.39x | 1.31x | 1.00x |
64×64 | 1.09x | 1.15x | 1.00x |
128×128 | 1.31x | 1.18x | 1.00x |
256×256 | 1.24x | 1.00x | 1.04x |
512×512 | 1.20x | 1.00x | 1.22x |
Multiply
Performing C = A * B
Raw data
Results in ms | OpenCV | Armadillo | Eigen |
4×4 | 0.00115 | 0.00017 | 0.00086 |
8×8 | 0.00195 | 0.00078 | 0.00261 |
16×16 | 0.00321 | 0.00261 | 0.00678 |
32×32 | 0.01865 | 0.01947 | 0.02130 |
64×64 | 0.15366 | 0.33080 | 0.07835 |
128×128 | 1.87008 | 1.72719 | 0.35859 |
256×256 | 15.76724 | 3.70212 | 2.70168 |
512×512 | 119.09382 | 24.08409 | 22.73524 |
Normalised
Speed up over slowest | OpenCV | Armadillo | Eigen |
4×4 | 1.00x | 6.74x | 1.34x |
8×8 | 1.34x | 3.34x | 1.00x |
16×16 | 2.11x | 2.60x | 1.00x |
32×32 | 1.14x | 1.09x | 1.00x |
64×64 | 2.15x | 1.00x | 4.22x |
128×128 | 1.00x | 1.08x | 5.22x |
256×256 | 1.00x | 4.26x | 5.84x |
512×512 | 1.00x | 4.94x | 5.24x |
Transpose
Performing C = A^T
Raw data
Results in ms | OpenCV | Armadillo | Eigen |
4×4 | 0.00067 | 0.00004 | 0.00003 |
8×8 | 0.00029 | 0.00006 | 0.00008 |
16×16 | 0.00034 | 0.00028 | 0.00028 |
32×32 | 0.00071 | 0.00068 | 0.00110 |
64×64 | 0.00437 | 0.00592 | 0.00500 |
128×128 | 0.01552 | 0.06537 | 0.03486 |
256×256 | 0.08828 | 0.40813 | 0.20032 |
512×512 | 0.52455 | 1.51452 | 0.77584 |
Normalised
Speed up over slowest | OpenCV | Armadillo | Eigen |
4×4 | 1.00x | 17.61x | 26.76x |
8×8 | 1.00x | 4.85x | 3.49x |
16×16 | 1.00x | 1.20x | 1.21x |
32×32 | 1.56x | 1.61x | 1.00x |
64×64 | 1.35x | 1.00x | 1.18x |
128×128 | 4.21x | 1.00x | 1.88x |
256×256 | 4.62x | 1.00x | 2.04x |
512×512 | 2.89x | 1.00x | 1.95x |
Inversion
Performing C = A^-1
Raw data
Results in ms | OpenCV | Armadillo | Eigen |
4×4 | 0.00205 | 0.00046 | 0.00271 |
8×8 | 0.00220 | 0.00417 | 0.00274 |
16×16 | 0.00989 | 0.01255 | 0.01094 |
32×32 | 0.06101 | 0.05146 | 0.05023 |
64×64 | 0.41286 | 0.25769 | 0.27921 |
128×128 | 3.60347 | 3.76052 | 1.88089 |
256×256 | 33.72502 | 23.10218 | 11.62692 |
512×512 | 285.03784 | 126.70175 | 162.74253 |
Normalised
Speed up over slowest | OpenCV | Armadillo | Eigen |
4×4 | 1.32x | 5.85x | 1.00x |
8×8 | 1.90x | 1.00x | 1.52x |
16×16 | 1.27x | 1.00x | 1.15x |
32×32 | 1.00x | 1.19x | 1.21x |
64×64 | 1.00x | 1.60x | 1.48x |
128×128 | 1.04x | 1.00x | 2.00x |
256×256 | 1.00x | 1.46x | 2.90x |
512×512 | 1.00x | 2.25x | 1.75x |
SVD
Performing full SVD, [U,S,V] = SVD(A)
Raw data
Results in ms | OpenCV | Armadillo | Eigen |
4×4 | 0.01220 | 0.22080 | 0.01620 |
8×8 | 0.01760 | 0.05760 | 0.03340 |
16×16 | 0.10700 | 0.16560 | 0.25540 |
32×32 | 0.51480 | 0.70230 | 1.13900 |
64×64 | 3.63780 | 3.43520 | 6.63350 |
128×128 | 27.04300 | 23.01600 | 64.27500 |
256×256 | 240.11000 | 210.70600 | 675.84100 |
512×512 | 1727.44000 | 1586.66400 | 6934.32300 |
Normalised
Discussion
Overall, the average running time has decreased for all the operations, which is a good start. Even OpenCV has lower running time, maybe the NDEBUG has an affect, since it’s a standardised define.
Speed up over slowest | OpenCV | Armadillo | Eigen |
4×4 | 18.10x | 1.00x | 13.63x |
8×8 | 3.27x | 1.00x | 1.72x |
16×16 | 2.39x | 1.54x | 1.00x |
32×32 | 2.21x | 1.62x | 1.00x |
64×64 | 1.82x | 1.93x | 1.00x |
128×128 | 2.38x | 2.79x | 1.00x |
256×256 | 2.81x | 3.21x | 1.00x |
512×512 | 4.01x | 4.37x | 1.00x |
Discussion
Overall, average running time has decreased for all operations, which is a good sign. Even OpenCV, maybe the NDEBUG has an affect, since it’s a standardised define.
The results from the addition test show all 3 libraries giving more or less the same result. This is probably not a surprise since adding matrix is a very straight forward O(N) task.
The multiply test is a bit more interesting. For matrix 64×64 or larger, there is a noticeable gap between the libraries. Eigen is very fast, with Armadillo coming in second for matrix 256×256 or greater. I’m guessing for larger matrices Eigen and Armadillo leverages the extra CPU core, because I did see all the CPU cores utilised briefly during benchmarking.
The transpose test involve shuffling memory around. This test is affected by the CPU’s caching mechanism. OpenCV does a good job as the matrix size increases.
The inversion test is a bit of a mixed bag. OpenCV seems to be the slowest out of the two.
The SVD test is interesting. Seems like there is a clear range where OpenCV and Armadillo are faster. Eigen lags behind by quite a bit as the matrix size increases.
Conclusion
In practice, if you just want a matrix library and nothing more then Armadillo or Eigen is probably the way to go. If you want something that is very portable with minimal effort then choose Eigen, because the entire library is header based, no library linking required. If you want the fastest matrix code possible then you can be adventurous and try combining the best of each library.
Download
Code compiled with:
g++ test_matrix_lib.cpp -o test_matrix_lib -lopencv_core -larmadillo -lgomp -fopenmp \ -march=native -O3 -DARMA_NO_DEBUG -DNDEBUG