首先,我知道有多个线程涉及这个问题,但是我不能得到一个直接的答案,并遇到了一些失败的计算错误。在
我准备了一个MATLAB和Python的元素乘法基准测试。这是一种最简单、最前沿的方法,可以很容易地计算flop计数。在
它使用NxN数组(matrix),但不执行矩阵乘法,而是按元素进行乘法。这一点很重要,因为当使用矩阵乘法时,运算数不是N^3!!!在
然而,随机生成的数的元素乘法运算必须在N^2次运算中执行
我有一个intel i7-4770(我想它有4个物理核和8个虚拟核)@3.5GHz。所以如果假设每个周期有4次浮点运算,那么每个核心应该是14次浮点运算!在
MATLAB/Numpy/Scipy别靠近它。在
为什么?在
MATLAB软件:%element wise multiplication benchmark
N = 10^4;
nOps = N^2;
m1 = randn(N);
m2 = randn(size(m1));
m = randn(size(m1));
m1 = single(m1);
m2 = single(m2);
% clear m
tic
m1 = m1 .* m2;
t = toc;
gflops = nOps/t*1e-9;
t_gflops = [t gflops]
% clear m
tic
m1 = m1.*m2;
t = toc;
gflops = nOps/t*1e-9;
t_gflops = [t gflops]
% clear m
tic
m1 = m1.*m2;
t = toc;
gflops = nOps/t*1e-9;
t_gflops = [t gflops]
version('-blas')
version('-lapack')
结果是:
^{pr2}$
现在Python:import numpy as np
# import gnumpy as gnp
import scipy as sp
import scipy.linalg as la
import time
if __name__ == '__main__':
N = 10**4
nOps = N**2
a = np.random.randn(N,N).astype(np.float32)
b = np.random.randn(N,N).astype(np.float32)
t = time.time()
c = a*b
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = np.multiply(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = sp.multiply(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = sp.multiply(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
a = np.random.randn(N,1).astype(np.float32)
b = np.random.randn(1,N).astype(np.float32)
t = time.time()
c1 = np.dot(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = np.dot(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = la.blas.dgemm(1.0,a,b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = la._fblas.dgemm(1.0,a,b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
print("numpy config")
np.show_config()
print("scipy config")
sp.show_config()
# numpy
结果是:dt = 0.16301608085632324 , gflops = 0.6134364136022663
dt = 0.16701674461364746 , gflops = 0.5987423610209003
dt = 0.1770176887512207 , gflops = 0.5649152957845881
dt = 0.188018798828125 , gflops = 0.5318617107612401
dt = 0.151015043258667 , gflops = 0.6621856858903415
dt = 0.17201733589172363 , gflops = 0.5813367558659613
dt = 0.3080308437347412 , gflops = 0.3246428142959423
dt = 0.39503931999206543 , gflops = 0.253139358385916
numpy配置
mkl\U信息:define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']
lapack_mkl_信息:define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']
lapack_opt_信息:define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']
blas_opt_信息:define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']
openblas_lapack_信息:
不可用
blas_mkl_信息:define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']
scipy配置
mkl\U信息:define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']
lapack_mkl_信息:define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']
lapack_opt_信息:define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']
blas_opt_信息:define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']
openblas_lapack_信息:
不可用
blas_mkl_信息:define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']
进程结束,退出代码为0