CBLAS的安装与使用

最新推荐文章于 2022-07-15 17:40:14 发布

小白的学习笔记

最新推荐文章于 2022-07-15 17:40:14 发布

阅读量4.9k

点赞数 1

分类专栏： Others 研究相关

研究相关同时被 2 个专栏收录

59 篇文章 0 订阅

订阅专栏

Others

11 篇文章 0 订阅

订阅专栏

CBLAS是BLAS的C语言接口。BLAS的全称是Basic Linear Algebra Subprograms，中文大概可以叫做基础线性代数子程序。主要是用于向量和矩阵计算的高性能数学库。本身BLAS是用Fortran写的，为了方便C/C++程序的使用，就有了BLAS的C接口库CBLAS。BLAS的主页是http://www.netlib.org/blas/，CBLAS的下载地址也可以在这个页面上找到。

CBLAS安装需要先装BLAS，从主页上下载blas.tgz，解压，根据系统修改make.inc和Makefile，make，就会生成一个blas_LINUX.a文件。然后，下载cblas.tgz，解压，在目录下将Makefile.*文件改名或者做一个链接文件为Makefile.in文件，比如在linux下就是ln -s Makefile.LINUX Makefile.in，根据具体情况修改Makefile.in文件，主要是BLAS的库文件路径BLLIB和CBLAS的安装目录CBDIR，make help就可以打印出可以使用的make命令，要生成全部文件就是用make all。在 $(CBDIR)目录下的$ (CBLIBDIR)将生成CBLAS的库文件$(CBLIB)，cblas_LINUX.a。

在CBLAS的安装目录 $(CBDIR)下的src目录中有个cblas.h是包括的CBLAS的函数和常量的头文件，使用CBLAS的时候就需要这个头文件，同时还需要BLAS的库文件$ (BLLIB )和CBLAS的库文件$(CBLIB)。

CBLAS/BLAS分为3个level，level1是用于向量的计算，level2是用于向量和矩阵之间的计算，level3是矩阵之间的计算。比如计算矩阵的乘法就是属于level3，这里就用矩阵乘法来学习使用CBLAS。

计算矩阵乘法的函数之一是 cblas_sgemm，使用单精度实数，另外还有对应双精度实数，单精度复数和双精度复数的函数。在此以 cblas_sgemm为例。

函数定义为：

void cblas_sgemm(const enum CBLAS_ORDER Order, 
                 const enum CBLAS_TRANSPOSE TransA,
                 const enum CBLAS_TRANSPOSE TransB,   
                 const int M, 
                 const int N,
                 const int K, 
                 const float alpha, 
                 const float  *A,
                 const int lda, 
                 const float  *B, 
                 const int ldb,
                 const float beta, 
                 float  *C, 
                 const int ldc)

关于此函数的详细定义可以在http://www.netlib.org/blas/sgemm.f 找到，只不过是fortran语言的,这个C语言版的略有差别。

此函数计算的是 C = alpha*op( A )*op( B ) + beta*C,

const enum CBLAS_ORDER Order，这是指的数据的存储形式，在CBLAS的函数中无论一维还是二维数据都是用一维数组存储，这就要涉及是行主序还是列主序，在C语言中数组是用行主序，fortran中是列主序。我还是习惯于是用行主序，所以这个参数是用CblasRowMajor，如果是列主序的话就是CblasColMajor。

const enum CBLAS_TRANSPOSE TransA和 const enum CBLAS_TRANSPOSE TransB，这两个参数影响的是op( A )和op( B)，可选参数为CblasNoTrans=111, CblasTrans=112, CblasConjTrans=113，其中TransA = CblasNoTrans, op( A ) = A，TransA = CblasTrans, op( A ) = A’，TransA = CblasConjTrans, op( A ) = A’。 TransB类似。

const int M，矩阵A的行，矩阵C的行
const int N，矩阵B的列，矩阵C的列
const int K，矩阵A的列，矩阵B的行

const float alpha， const float beta，计算公式中的两个参数值，如果只是计算C=A*B，则alpha=1,beta=0

const float *A， const float *B， const float *C，矩阵ABC的数据

const int lda， const int ldb， const int ldc，在BLAS的文档里，这三个参数分别为ABC的行数，但是实际使用发现，在CBLAS里应该是列数。

我在这里计算两个简单矩阵的乘法。

A:
1,2,3
4,5,6
7,8,9
8,7,6

B:
5,4
3,2
1,0

程序代码：

//因为程序是C++，而CBLAS是C语言写的，所以在此处用extern关键字
extern "C"
{
 #include "cblas.h"
} 
#include <iostream>
using namespace std;
int main(void) {

 const enum CBLAS_ORDER Order=CblasRowMajor;
 const enum CBLAS_TRANSPOSE TransA=CblasNoTrans;
 const enum CBLAS_TRANSPOSE TransB=CblasNoTrans;
 const int M=4;//A的行数，C的行数
 const int N=2;//B的列数，C的列数
 const int K=3;//A的列数，B的行数
 const float alpha=1;
 const float beta=0;
 const int lda=K;//A的列
 const int ldb=N;//B的列
 const int ldc=N;//C的列
 const float A[K*M]={1,2,3,4,5,6,7,8,9,8,7,6};
 const float B[K*N]={5,4,3,2,1,0};
 float C[M*N]; 

 cblas_sgemm(Order, TransA, TransB, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc);

 for(int i=0;i<M;i++){
     for(int j=0;j<N;j++){
       cout<<C[i*N+j]<<"\t";
      }
      cout<<endl;
 } 
 return 1;
}

在编译的时候需要带上cblas_LINUX.a和blas_LINUX.a，比如，

g++ main.cpp cblas_LINUX.a blas_LINUX.a -o main

当然，这里假定是这两个.a文件是放在可以直接访问的位置，或者写全路径也可以。

这种做法在CentOS.5下顺利通过，但是在我的Ubuntu.7.10下出了问题，blas_LINUX.a正常编译生成，但在链接的时候出了错误，所以只好从源里安装了atlas，sudo apt-get install atlas3-base，在/usr/lib/atlas/目录下就会有libblas.和liblapack.库文件，只需要在链接的时候用这里的blas库文件替换上文安装的BLAS就可以正常编译通过。

另外，在GSL下也有BLAS和CBLAS，在boost里有ublas也提供CBLAS/BLAS的功能，有时间也拿来研究研究。

附一个评论：
“const int lda， const int ldb， const int ldc，在BLAS的文档里，这三个参数分别为ABC的行数，但是实际使用发现，在CBLAS里应该是列数。”
这个不太对。lda,ldb,ldc不是简单地指矩阵的行或列数，否则有M、N、K不就够了嘛？
查资料发现（http://www.stanford.edu/class/me200c/tutorial_77/10_arrays.html），它其实是指矩阵中“两个相邻行之间首元素的索引值之差”，相当于一个stride值的作用。因为在实际的内存中存放时可能两行元素之间不是紧密排列的而是有个间距，比如从一个大矩阵中取出的一个子矩阵。
这段话说得清楚：
Consider again the example where we only use the upper 3 by 3 submatrix of the 3 by 5 array A(3,5). The 9 interesting elements will then be stored in the first nine memory locations, while the last six are not used. This works out neatly because the leading dimension is the same for both the array and the matrix we store in the array. However, frequently the leading dimension of the array will be larger than the first dimension of the matrix. Then the matrix will not be stored contiguously in memory, even if the array is contiguous. For example, suppose the declaration was A(5,3) instead. Then there would be two “unused” memory cells between the end of one column and the beginning of the next column (again we are assuming the matrix is 3 by 3).
This may seem complicated, but actually it is quite simple when you get used to it. If you are in doubt, it can be useful to look at how the address of an array element is computed. Each array will have some memory address assigned to the beginning of the array, that is element (1,1). The address of element (i,j) is then given by
addr[A(i,j)] = addr[A(1,1)] + (j-1)*lda + (i-1)
where lda is the leading (i.e. row) dimension of A. Note that lda is in general different from the actual matrix dimension. Many Fortran errors are caused by this, so it is very important you understand the distinction!

个人修改：

在Ubuntu12.04中同样会报错，不过安装命令并不是
sudo apt-get install atlas3-base
而是：sudo apt-get install libatlas-dev
安装之后生成的库在/usr/lib下面有个软连接，名字为libblas.a，blas的实际安装目录为/etc/alternatives/libblas.a，然后在编译时使用命令：
g++ main.cpp cblas_LINUX.a libblas.a -o main
即不需要使用blas_LINUX.a库，也不需要安装之前的blas了。

另外ls 按照时间升序查看文件的命令为：
ls -lrt
降序为: ls -lt

参考地址：http://duanple.blog.163.com/blog/static/709717672010321336326/
http://iysm.net/?p=53
http://www.opentissue.org/mediawiki/index.php/Installing_on_Ubuntu