1. 官方文档的定义
In the simplest case, the output value of the layer with input size
(
N
,
C
in
,
L
)
(N, C_{\text{in}}, L)
(N,Cin,L) and output
(
N
,
C
out
,
L
out
)
(N, C_{\text{out}}, L_{\text{out}})
(N,Cout,Lout) can be precisely described as:
out
(
N
i
,
C
out
j
)
=
bias
(
C
out
j
)
+
∑
k
=
0
C
i
n
−
1
weight
(
C
out
j
,
k
)
⋆
input
(
N
i
,
k
)
\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)
out(Ni,Coutj)=bias(Coutj)+k=0∑Cin−1weight(Coutj,k)⋆input(Ni,k)
where
⋆
\star
⋆ is the valid cross-correlation
_ operator,
N
is a batch size, C
denotes a number of channels,
L
is a length of signal sequence.
$$
This module supports :ref:`TensorFloat32<tf32_on_ampere>`.
* :attr:`stride` controls the stride for the cross-correlation, a single
number or a one-element tuple.
* :attr:`padding` controls the amount of implicit zero-paddings on both sides
for :attr:`padding` number of points.
* :attr:`dilation` controls the spacing between the kernel points; also
known as the à trous algorithm. It is harder to describe, but this `link`_
has a nice visualization of what :attr:`dilation` does.
* :attr:`groups` controls the connections between inputs and outputs.
:attr:`in_channels` and :attr:`out_channels` must both be divisible by
:attr:`groups`. For example,
* At groups=1, all inputs are convolved to all outputs.
* At groups=2, the operation becomes equivalent to having two conv
layers side by side, each seeing half the input channels,
and producing half the output channels, and both subsequently
concatenated.
* At groups= :attr:`in_channels`, each input channel is convolved with
its own set of filters,
of size
⌊ o u t _ c h a n n e l s i n _ c h a n n e l s ⌋ \left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor ⌊in_channelsout_channels⌋
1.1 参数解释
- Input: ( N , C i n , L i n ) (N, C_{in}, L_{in}) (N,Cin,Lin)
- Output: ( N , C o u t , L o u t ) (N, C_{out}, L_{out}) (N,Cout,Lout) where
其中如上文所述:
-
N
代表 batch size, -
C
代表channels, 通道的数量。 在序列中,代表每个列向量的维度。
C i n C_{in} Cin 输入序列中,每个列向量的编码维度。
C o u t C_{out} Cout 输出序列中,期待每个列向量的编码维度。 -
L
代表 sequence 序列的长度,即序列中有多少个列向量。
L i n L_{in} Lin 输入序列中, 包含多少个列向量。
L o u t L_{out} Lout 输出序列中,包含多少个列向量。
输出如下所示:
L o u t = ⌊ L i n + 2 × padding − dilation × ( kernel_size − 1 ) − 1 stride + 1 ⌋ L_{out} = \left\lfloor\frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor Lout=⌊strideLin+2×padding−dilation×(kernel_size−1)−1+1⌋
1.2 运行举例
其中padding 默认0, dilation 默认1, groups 默认1,
计算公式,按照上文计算。
import torch.nn as nn
m = nn.Conv1d(16,33, 3, stride =2)
input = torch.rand(20, 16, 50)
output = m(input)
print(output.shape)
torch.Size([20, 33, 24])