The cblas_zgemm is a general complex matrix-matrix multiplication routine which will be intensively used in my project.
The prototype of cblas_zgemm is:
void cblas_zgemm(const enum CBLAS_ORDEROrder, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_TRANSPOSE TransB, const int M, const int N,
const int K, const void *alpha, const void *A,
const int lda, const void *B, const int ldb,
const void *beta, void *C, const int ldc);
I founda interesting problem with the CBLAS_ORDER parameter of cblas_zgemm when one dimension of matrix A/B is one, such A is 4*1 matrix and Bis 1*4 matrix (both with no transpose).
The CBLAS_ORDER is an enumuration:
enum CBLAS_ORDER {CblasRowMajor=101,CblasColMajor=102};
That is, to define whether the matrix storage is row major or column major.
I wrote a test routine:
lapack_complex_double*tu;
lapack_complex_double *tcb_sm_ap4[16];
const double tmp_ap4_u[] = {1,0, -1,0, -1,0,-1,0, \
1,0, 0,-1, 1,0, 0,1, \
1,0, 1,0, -1,0, 1,0, \
1,0, 0,1, 1,0, 0,-1, \
1,0, -INV_SQRT_2,-INV_SQRT_2,0,-1, INV_SQRT_2,-INV_SQRT_2, \
1,0, INV_SQRT_2,-INV_SQRT_2,0,1, -INV_SQRT_2,-INV_SQRT_2, \
1,0, INV_SQRT_2,INV_SQRT_2,0,-1, -INV_SQRT_2,INV_SQRT_2, \
1,0, -INV_SQRT_2,INV_SQRT_2,0,1, INV_SQRT_2,INV_SQRT_2, \
1,0, -1,0, 1,0, 1,0, \
1,0, 0,-1, -1,0, 0,-1, \
1,0, 1,0, 1,0, -1,0, \
1,0, 0,1, -1,0, 0,1, \
1,0, -1,0, -1,0, 1,0, \
1,0, -1,0, 1,0, -1,0, \
1,0, 1,0, -1,0, -1,0, \
1,0, 1,0, 1,0, 1,0};
tu = newlapack_complex_double[16 * 4];
memcpy(tu, tmp_ap4_u,sizeof(tmp_ap4_u));
for(int i = 0; i< 16; ++i)
{
const double tmp_i4[] = {1,0,0,0, 0,0, 0,0, \
0,0, 1,0,0,0, 0,0, \
0,0, 0,0,1,0, 0,0, \
0,0, 0,0,0,0, 1,0};
tcb_sm_ap4[i] = newlapack_complex_double[4*4];
memcpy(tcb_sm_ap4[i], tmp_i4,sizeof(tmp_i4));
lapack_complex_doublealpha;
cblas_zdotc_sub(4, tu+4*i,1, tu+4*i, 1, &alpha);
alpha =complex_div(make_complex(-2, 0), alpha);
lapack_complex_doublebeta = make_complex(1, 0);
cblas_zgemm(CblasColMajor,CblasNoTrans, CblasConjTrans, 4, 4, 1, &alpha,tu+4*i, 4, tu+4*i, 4, &beta, tcb_sm_ap4[i],4);
//debug
std::cout<< "i = "<< i <<", C = \n";
print_matrix(4, 4,tcb_sm_ap4[i]);
}
delete[] tu;
for(int i = 0; i < 16; ++i)
{
delete[] tcb_sm_ap4[i];
}
Note that, the above routine is to generate the Wn matrix (Wn =I - 2uu'/u'u, where u' is the conjugate transpose of u)as specified in table table 6.3.4.2.3-2 of 36.211, which is the precoding codebooks for four-antenna port cases.
also note that the print_matrix function will print the complex matrix to console.
If the cblas_zgemm is called with Order=CblasRowMajor, you will get weird results:
i = 0, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 1, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 2, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 3, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 4, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 5, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 6, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 7, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 8, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 9, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 10, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 11, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 12, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 13, C =
0.5 -0.5 -0.5 1.26509e-098-1.32849e+303j
-0.5 0.5 -0.5 1.26509e-098-1.32849e+303j
-0.5 -0.5 0.5 1.26509e-098-1.32849e+303j
1.26509e-098+1.32849e+303j 1.26509e-098+1.32849e+303j1.26509e-098+1.32849e+303j
-1.#INF
i = 14, C =
0.5 -0.5 1.26509e-098-1.32849e+303j -0.25
-0.5 0.5 1.26509e-098-1.32849e+303j -0.25
1.26509e-098+1.32849e+303j 1.26509e-098+1.32849e+303j -1.#INF6.32543e-099+6.642
46e+302j
-0.25 -0.25 6.32543e-099-6.64246e+302j 0.875
i = 15, C =
0.5 1.26509e-098-1.32849e+303j -0.25 0.25
1.26509e-098+1.32849e+303j -1.#INF 6.32543e-099+6.64246e+302j-6.32543e-099-6.64
246e+302j
-0.25 6.32543e-099-6.64246e+302j 0.875 0.125
0.25 -6.32543e-099+6.64246e+302j 0.125 0.875
Press any key to continue . . .
But when replacing the Order with
CblasColMajor, it works properly:
i = 0, C =
0.5 0.5 0.5 0.5
0.5 0.5 -0.5 -0.5
0.5 -0.5 0.5 -0.5
0.5 -0.5 -0.5 0.5
i = 1, C =
0.5 0.5j -0.5 -0.5j
-0.5j 0.5 -0.5j 0.5
-0.5 0.5j 0.5 -0.5j
0.5j 0.5 0.5j 0.5
i = 2, C =
0.5 -0.5 0.5 -0.5
-0.5 0.5 0.5 -0.5
0.5 0.5 0.5 0.5
-0.5 -0.5 0.5 0.5
i = 3, C =
0.5 -0.5j -0.5 0.5j
0.5j 0.5 0.5j 0.5
-0.5 -0.5j 0.5 0.5j
-0.5j 0.5 -0.5j 0.5
i = 4, C =
0.5 0.353553+0.353553j 0.5j -0.353553+0.353553j
0.353553-0.353553j 0.5 -0.353553-0.353553j -0.5j
-0.5j -0.353553+0.353553j 0.5 -0.353553-0.353553j
-0.353553-0.353553j 0.5j -0.353553+0.353553j 0.5
i = 5, C =
0.5 -0.353553+0.353553j -0.5j 0.353553+0.353553j
-0.353553-0.353553j 0.5 0.353553-0.353553j 0.5j
0.5j 0.353553+0.353553j 0.5 0.353553-0.353553j
0.353553-0.353553j -0.5j 0.353553+0.353553j 0.5
i = 6, C =
0.5 -0.353553-0.353553j 0.5j 0.353553-0.353553j
-0.353553+0.353553j 0.5 0.353553+0.353553j -0.5j
-0.5j 0.353553-0.353553j 0.5 0.353553+0.353553j
0.353553+0.353553j 0.5j 0.353553-0.353553j 0.5
i = 7, C =
0.5 0.353553-0.353553j -0.5j -0.353553-0.353553j
0.353553+0.353553j 0.5 -0.353553+0.353553j 0.5j
0.5j -0.353553-0.353553j 0.5 -0.353553+0.353553j
-0.353553+0.353553j -0.5j -0.353553-0.353553j 0.5
i = 8, C =
0.5 0.5 -0.5 -0.5
0.5 0.5 0.5 0.5
-0.5 0.5 0.5 -0.5
-0.5 0.5 -0.5 0.5
i = 9, C =
0.5 0.5j 0.5 0.5j
-0.5j 0.5 0.5j -0.5
0.5 -0.5j 0.5 -0.5j
-0.5j -0.5 0.5j 0.5
i = 10, C =
0.5 -0.5 -0.5 0.5
-0.5 0.5 -0.5 0.5
-0.5 -0.5 0.5 0.5
0.5 0.5 0.5 0.5
i = 11, C =
0.5 -0.5j 0.5 -0.5j
0.5j 0.5 -0.5j -0.5
0.5 0.5j 0.5 0.5j
0.5j -0.5 -0.5j 0.5
i = 12, C =
0.5 0.5 0.5 -0.5
0.5 0.5 -0.5 0.5
0.5 -0.5 0.5 0.5
-0.5 0.5 0.5 0.5
i = 13, C =
0.5 0.5 -0.5 0.5
0.5 0.5 0.5 -0.5
-0.5 0.5 0.5 0.5
0.5 -0.5 0.5 0.5
i = 14, C =
0.5 -0.5 0.5 0.5
-0.5 0.5 0.5 0.5
0.5 0.5 0.5 -0.5
0.5 0.5 -0.5 0.5
i = 15, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
Press any key to continue . . .
Since operator A is a 4*1 matrix, row-major or column-major should have the same storage layout,but the cblas_zgemm seems prefer column-major.