Matrix multiplication
Time Limit: 4000/2000 MS (Java/Others) Memory Limit: 131072/131072 K (Java/Others)Total Submission(s): 4792 Accepted Submission(s): 1855
Problem Description
Given two matrices A and B of size n×n, find the product of them.
bobo hates big integers. So you are only asked to find the result modulo 3.
bobo hates big integers. So you are only asked to find the result modulo 3.
Input
The input consists of several tests. For each tests:
The first line contains n (1≤n≤800). Each of the following n lines contain n integers -- the description of the matrix A. The j-th integer in the i-th line equals A ij. The next n lines describe the matrix B in similar format (0≤A ij,B ij≤10 9).
The first line contains n (1≤n≤800). Each of the following n lines contain n integers -- the description of the matrix A. The j-th integer in the i-th line equals A ij. The next n lines describe the matrix B in similar format (0≤A ij,B ij≤10 9).
Output
For each tests:
Print n lines. Each of them contain n integers -- the matrix A×B in similar format.
Print n lines. Each of them contain n integers -- the matrix A×B in similar format.
Sample Input
1 0 1 2 0 1 2 3 4 5 6 7
Sample Output
0 0 1 2 1
Author
Xiaoxu Guo (ftiasch)
Source
题意:输出两个n阶矩阵相乘的结果。
思路:这题主要是解决超时的问题,由于结果要模3,如果数据是随机的话,有1/3的数字会是0,那么更改一下循环嵌套的顺序,遇到0的直接continue就不会超时了。实际上,更改了循环顺序后,无需判断0都能AC,引用 Shangli Cloud 的话:
这样写会超时:
for (int i=1; i<=n; i++)
for (int j=1; j<=n; j++)
for (int k=1; k<=n; k++)
c[i][j]+=a[i][k]*b[k][j];
这样写就能过:
for (int k=1; k<=n; k++)
for (int i=1; i<=n; i++)
for (int j=1; j<=n; j++)
c[i][j]+=a[i][k]*b[k][j];
为什么?
----------------------------------------------------------------------------------
我们知道内存中二维数组是以行为单位连续存储的,逐列访问将会每次跳1000*4(bytes)。根据cpu cache的替换策略,将会有大量的cache失效。
时间居然会相差很多。 可见利用好cpu cache优化我们的程序,是非常有必要掌握的技能。
平时写程序时,也应当尽量使cpu对内存的访问,是尽可能连续的。
# include <iostream> # include <cstdio> using namespace std; int a[801][801], b[801][801], c[801][801]; int main() { int n; while(~scanf("%d",&n)) { for(int i=0; i<n; ++i) for(int j=0; j<n; ++j) scanf("%d",&a[i][j]), a[i][j]%=3, c[i][j]=0; for(int i=0; i<n; ++i) for(int j=0; j<n; ++j) scanf("%d",&b[i][j]),b[i][j]%=3; for(int k=0; k<n; ++k) for(int i=0; i<n; ++i) for(int j=0; j<n; ++j) if(a[i][k])//因此这个注释掉也不会超时。 c[i][j] += a[i][k]*b[k][j]; for(int i=0; i<n; ++i) for(int j=0; j<n; ++j) printf("%d%c",c[i][j]%3,j==n-1?'\n':' '); } return 0; }
下面是用Bitset容器的做法。
# include <iostream> # include <cstdio> # include <bitset> using namespace std; bitset<803>a[803][4], b[803][4]; int fun(int i, int j) { int x1 = (a[i][1]&b[j][1]).count(); int x2 = (a[i][1]&b[j][2]).count(); int x3 = (a[i][2]&b[j][1]).count(); int x4 = (a[i][2]&b[j][2]).count(); return x1 + ((x2+x3)<<1) + (x4<<2); } int main() { int n, t; while(~scanf("%d",&n)) { for(int i=0; i<n; ++i) for(int j=0; j<3; ++j) a[i][j].reset(), b[i][j].reset(); for(int i=0; i<n; ++i) for(int j=0; j<n; ++j) { scanf("%d",&t); a[i][t%3].set(j); } for(int i=0; i<n; ++i) for(int j=0; j<n; ++j) { scanf("%d",&t); b[j][t%3].set(i); } for(int i=0; i<n; ++i) for(int j=0; j<n; ++j) printf("%d%c",fun(i, j)%3,j==n-1?'\n':' '); } return 0; }