tf.matmul - 矩阵乘法

本文详细介绍了TensorFlow中tf.matmul函数的使用方法,包括如何进行矩阵乘法、转置和共轭操作,以及如何利用稀疏矩阵优化计算。通过多个实例展示了不同维度的张量之间的乘法运算。

tf.matmul - 矩阵乘法

https://github.com/tensorflow/docs/tree/r1.4/site/en/api_docs/api_docs/python/tf
site/en/api_docs/api_docs/python/tf/matmul.md

matmul(
    a,
    b,
    transpose_a=False,
    transpose_b=False,
    adjoint_a=False,
    adjoint_b=False,
    a_is_sparse=False,
    b_is_sparse=False,
    name=None
)

Defined in tensorflow/python/ops/math_ops.py.
See the guide: Math > Matrix Math Functions

Multiplies matrix a by matrix b, producing a * b.
矩阵 a 乘以矩阵 b 生成 a * b。

The inputs must, following any transpositions, be tensors of rank >= 2 where the inner 2 dimensions specify valid matrix multiplication arguments, and any further outer dimensions match.
输入必须在任何转换之后是 rank >= 2 的张量,其中内部 2 维度指定有效的矩阵乘法参数,并且任何其他外部维度匹配。

Both matrices must be of the same type. The supported types are: float16, float32, float64, int32, complex64, complex128.
两个矩阵必须是相同类型。

Either matrix can be transposed or adjointed (conjugated and transposed) on the fly by setting one of the corresponding flag to True. These are False by default.
通过将相应的标志之一设置为 True,矩阵可以被 transposed or adjointed (共轭和转置)。默认情况下,这些都是 False。

If one or both of the matrices contain a lot of zeros, a more efficient multiplication algorithm can be used by setting the corresponding a_is_sparse or b_is_sparse flag to True. These are False by default. This optimization is only available for plain matrices (rank-2 tensors) with datatypes bfloat16 or float32.
如果一个或两个矩阵包含很多的零,则可以通过将相应的 a_is_sparse 或 b_is_sparse 标志设置为 True 来使用更有效的乘法算法,默认为 False。这个优化仅适用于具有数据类型为 bfloat16 或 float32 的纯矩阵 (rank 为 2 的张量)。

transposition [trænspə'zɪʃ(ə)n; trɑːns-; -nz-]:n. 调换,换置,词序的换位,移项,一个古老故事的现代翻版,变换物
conjugate ['kɒndʒʊgeɪt]:v. 列举 (动词的) 词形变化,结合,使成对,使共轭 adj. 共轭的,结合的 n. 结合物,共轭物,偶联物

For example:

# 2-D tensor `a`
# [[1, 2, 3],
#  [4, 5, 6]]
a = tf.constant([1, 2, 3, 4, 5, 6], shape=[2, 3])

# 2-D tensor `b`
# [[ 7,  8],
#  [ 9, 10],
#  [11, 12]]
b = tf.constant([7, 8, 9, 10, 11, 12], shape=[3, 2])

# `a` * `b`
# [[ 58,  64],
#  [139, 154]]
c = tf.matmul(a, b)


# 3-D tensor `a`
# [[[ 1,  2,  3],
#   [ 4,  5,  6]],
#  [[ 7,  8,  9],
#   [10, 11, 12]]]
a = tf.constant(np.arange(1, 13, dtype=np.int32),
                shape=[2, 2, 3])

# 3-D tensor `b`
# [[[13, 14],
#   [15, 16],
#   [17, 18]],
#  [[19, 20],
#   [21, 22],
#   [23, 24]]]
b = tf.constant(np.arange(13, 25, dtype=np.int32),
                shape=[2, 3, 2])

# `a` * `b`
# [[[ 94, 100],
#   [229, 244]],
#  [[508, 532],
#   [697, 730]]]
c = tf.matmul(a, b)

# Since python >= 3.5 the @ operator is supported (see PEP 465).
# In TensorFlow, it simply calls the `tf.matmul()` function, so the
# following lines are equivalent:
d = a @ b @ [[10.], [11.]]
d = tf.matmul(tf.matmul(a, b), [[10.], [11.]])

1. Args

  • a: Tensor of type float16, float32, float64, int32, complex64, complex128 and rank > 1.
  • b: Tensor with same type and rank as a.
  • transpose_a: If True, a is transposed before multiplication. (如果 True,a 在乘法之前转置。)
  • transpose_b: If True, b is transposed before multiplication. (如果 True,b 在乘法之前转置。)
  • adjoint_a: If True, a is conjugated and transposed before multiplication. (如果 True,a 在乘法之前共轭和转置。)
  • adjoint_b: If True, b is conjugated and transposed before multiplication. (如果 True,b 在乘法之前共轭和转置。)
  • a_is_sparse: If True, a is treated as a sparse matrix. (如果 True,a 被视为稀疏矩阵。)
  • b_is_sparse: If True, b is treated as a sparse matrix. (如果 True,b 被视为稀疏矩阵。)
  • name: Name for the operation (optional).

2. Returns

A Tensor of the same type as a and b where each inner-most matrix is the product of the corresponding matrices in a and b, e.g. if all transpose or adjoint attributes are False:
该函数返回与 a 和 b 具有相同类型的张量,其中每个最内矩阵是 a 和 b 中对应矩阵的乘积。例如,如果所有转置或伴随的属性为 False:

output[…, i, j] = sum_k (a[…, i, k] * b[…, k, j]),
for all indices i, j.

  • Note: This is matrix product, not element-wise product. (这是矩阵乘积,而不是元素的乘积。)

3. Raises

  • ValueError: If transpose_a and adjoint_a, or transpose_b and adjoint_b are both set to True.

4. Example

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import print_function
from __future__ import division

import os
import sys
import numpy as np
import tensorflow as tf

sys.path.append(os.path.dirname(os.path.abspath(__file__)))
current_directory = os.path.dirname(os.path.abspath(__file__))

print(16 * "++--")
print("current_directory:", current_directory)
print(16 * "++--")

# 2-D tensor `a`
a = tf.constant([1, 2, 3, 4, 5, 6], shape=[2, 3])
# => [[1. 2. 3.]
#     [4. 5. 6.]]

# 2-D tensor `b`
b = tf.constant([7, 8, 9, 10, 11, 12], shape=[3, 2])
# => [[7. 8.]
#     [9. 10.]
#     [11. 12.]]

c = tf.matmul(a, b)  # => [[58 64]
#     [139 154]]

with tf.Session() as sess:
    input_a = sess.run(a)
    print("input_a.shape:", input_a.shape)
    print("input_a:\n", input_a)
    print('\n')

    input_b = sess.run(b)
    print("input_b.shape:", input_b.shape)
    print("input_b:\n", input_b)
    print('\n')

    output_c = sess.run(c)
    print("output_c.shape:", output_c.shape)
    print("output_c:\n", output_c)
    print('\n')
/usr/bin/python2.7 /home/strong/tensorflow_work/R2CNN_Faster-RCNN_Tensorflow/yongqiang.py
++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--
current_directory: /home/strong/tensorflow_work/R2CNN_Faster-RCNN_Tensorflow
++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--
2019-08-21 20:31:03.554301: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-08-21 20:31:03.621830: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-21 20:31:03.622083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 7.31GiB
2019-08-21 20:31:03.622093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
input_a.shape: (2, 3)
input_a:
 [[1 2 3]
 [4 5 6]]


input_b.shape: (3, 2)
input_b:
 [[ 7  8]
 [ 9 10]
 [11 12]]


output_c.shape: (2, 2)
output_c:
 [[ 58  64]
 [139 154]]

Process finished with exit code 0

5. Example

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import print_function
from __future__ import division

import os
import sys
import numpy as np
import tensorflow as tf

sys.path.append(os.path.dirname(os.path.abspath(__file__)))
current_directory = os.path.dirname(os.path.abspath(__file__))

print(16 * "++--")
print("current_directory:", current_directory)
print(16 * "++--")

# 3-D tensor `a`
a = tf.constant(np.arange(1, 13, dtype=np.int32), shape=[2, 2, 3])
# => [[[ 1.  2.  3.]
#      [ 4.  5.  6.]],
#     [[ 7.  8.  9.]
#      [10. 11. 12.]]]

a0 = tf.constant(np.arange(1, 7, dtype=np.int32), shape=[2, 3])
# =>  [[ 1.  2.  3.]
#      [ 4.  5.  6.]]

a1 = tf.constant(np.arange(7, 13, dtype=np.int32), shape=[2, 3])
# =>  [[ 7.  8.  9.]
#      [10. 11. 12.]]

# 3-D tensor `b`
b = tf.constant(np.arange(13, 25, dtype=np.int32), shape=[2, 3, 2])
# => [[[13. 14.]
#      [15. 16.]
#      [17. 18.]],
#     [[19. 20.]
#      [21. 22.]
#      [23. 24.]]]

b0 = tf.constant(np.arange(13, 19, dtype=np.int32), shape=[3, 2])
# =>  [[13. 14.]
#      [15. 16.]
#      [17. 18.]]

b1 = tf.constant(np.arange(19, 25, dtype=np.int32), shape=[3, 2])
# =>  [[19. 20.]
#      [21. 22.]
#      [23. 24.]]

a0b0 = tf.matmul(a0, b0)
a0b1 = tf.matmul(a0, b1)
a1b0 = tf.matmul(a1, b0)
a1b1 = tf.matmul(a1, b1)

c = tf.matmul(a, b)
# => [[[ 94 100]
#      [229 244]],
#     [[508 532]
#      [697 730]]]

with tf.Session() as sess:
    input_a = sess.run(a)
    print("input_a.shape:", input_a.shape)
    print("input_a:\n", input_a)
    print('\n')

    input_b = sess.run(b)
    print("input_b.shape:", input_b.shape)
    print("input_b:\n", input_b)
    print('\n')

    output_c = sess.run(c)
    print("output_c.shape:", output_c.shape)
    print("output_c:\n", output_c)
    print('\n')

    input_a0 = sess.run(a0)
    print("input_a0.shape:", input_a0.shape)
    print("input_a0:\n", input_a0)
    print('\n')

    input_a1 = sess.run(a1)
    print("input_a1.shape:", input_a1.shape)
    print("input_a1:\n", input_a1)
    print('\n')

    input_b0 = sess.run(b0)
    print("input_b0.shape:", input_b0.shape)
    print("input_b0:\n", input_b0)
    print('\n')

    input_b1 = sess.run(b1)
    print("input_b1.shape:", input_b1.shape)
    print("input_b1:\n", input_b1)
    print('\n')

    output_a0b0 = sess.run(a0b0)
    print("output_a0b0.shape:", output_a0b0.shape)
    print("output_a0b0:\n", output_a0b0)
    print('\n')

    output_a0b1 = sess.run(a0b1)
    print("output_a0b1.shape:", output_a0b1.shape)
    print("output_a0b1:\n", output_a0b1)
    print('\n')

    output_a1b0 = sess.run(a1b0)
    print("output_a1b0.shape:", output_a1b0.shape)
    print("output_a1b0:\n", output_a1b0)
    print('\n')

    output_a1b1 = sess.run(a1b1)
    print("output_a1b1.shape:", output_a1b1.shape)
    print("output_a1b1:\n", output_a1b1)
    print('\n')

    print("output_a0b0 + a1b1:\n")
    print(output_a0b0)
    print(output_a1b1)
/usr/bin/python2.7 /home/strong/tensorflow_work/R2CNN_Faster-RCNN_Tensorflow/yongqiang.py
++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--
current_directory: /home/strong/tensorflow_work/R2CNN_Faster-RCNN_Tensorflow
++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--
2019-08-21 20:57:43.726875: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-08-21 20:57:43.792803: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-21 20:57:43.793048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 7.31GiB
2019-08-21 20:57:43.793059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
input_a.shape: (2, 2, 3)
input_a:
 [[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


input_b.shape: (2, 3, 2)
input_b:
 [[[13 14]
  [15 16]
  [17 18]]

 [[19 20]
  [21 22]
  [23 24]]]


output_c.shape: (2, 2, 2)
output_c:
 [[[ 94 100]
  [229 244]]

 [[508 532]
  [697 730]]]


input_a0.shape: (2, 3)
input_a0:
 [[1 2 3]
 [4 5 6]]


input_a1.shape: (2, 3)
input_a1:
 [[ 7  8  9]
 [10 11 12]]


input_b0.shape: (3, 2)
input_b0:
 [[13 14]
 [15 16]
 [17 18]]


input_b1.shape: (3, 2)
input_b1:
 [[19 20]
 [21 22]
 [23 24]]


output_a0b0.shape: (2, 2)
output_a0b0:
 [[ 94 100]
 [229 244]]


output_a0b1.shape: (2, 2)
output_a0b1:
 [[130 136]
 [319 334]]


output_a1b0.shape: (2, 2)
output_a1b0:
 [[364 388]
 [499 532]]


output_a1b1.shape: (2, 2)
output_a1b1:
 [[508 532]
 [697 730]]


output_a0b0 + a1b1:

[[ 94 100]
 [229 244]]
[[508 532]
 [697 730]]

Process finished with exit code 0
### TensorFlow `tf.matmul` 和 PyTorch `torch.bmm` 的区别与用法 #### 基本定义 TensorFlow 中的 `tf.matmul` 是用于矩阵乘法的操作,支持两个二维张量或多维张量之间的矩阵相乘。而 PyTorch 中的 `torch.bmm` 则专门针对三维张量设计,表示批量矩阵乘法 (Batch Matrix Multiplication)[^1]。 #### 输入维度的要求 对于 `tf.matmul` 来说,如果输入的是多于两维的张量,则会将最后两维视为矩阵并执行逐批矩阵乘法操作。其余前置维度会被视作批次大小的一部分处理[^2]。 而在 PyTorch 中,`torch.bmm` 明确要求输入必须是三个维度 `(batch_size, n, m)` 和 `(batch_size, m, p)` 形式的张量,并返回形状为 `(batch_size, n, p)` 的结果[^3]。 #### 性能优化方面 由于 `torch.bmm` 更加专注于特定场景下的高效实现——即当数据集包含多个独立的小型矩阵时可以更有效地利用硬件资源完成计算任务;相比之下,虽然 `tf.matmul` 同样能够胜任此类工作负载但它可能不会像前者那样特别针对某些情况做出额外性能上的考量。 以下是两者的一个简单对比代码示例: ```python import tensorflow as tf import torch # Example using TensorFlow's matmul a_tf = tf.random.uniform((5, 3, 4)) # Batch size of 5, matrices are 3x4 b_tf = tf.random.uniform((5, 4, 6)) # Batch size of 5, matrices are 4x6 result_tf = tf.matmul(a_tf, b_tf) # Result will have shape (5, 3, 6) print("Result from TF:", result_tf.shape) # Equivalent operation in PyTorch with bmm a_pt = torch.rand(5, 3, 4) # Same dimensions but now tensors b_pt = torch.rand(5, 4, 6) result_pt = torch.bmm(a_pt, b_pt) # Also results in a tensor shaped (5, 3, 6) print("Result from PT:", result_pt.size()) ``` 通过以上例子可以看出,在相同逻辑下两种框架提供了相似功能但各有侧重之处。 #### 使用注意事项 需要注意的一点是在实际应用过程中要确保所使用的库版本兼容以及正确设置设备环境(CPU/GPU),因为这可能会显著影响最终程序运行效率甚至可能导致错误发生。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Yongqiang Cheng

梦想不是浮躁,而是沉淀和积累。

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值