tf.nn.atrous_conv2d(value, filters, rate, padding, name=None) {#atrous_conv2d}

最新推荐文章于 2023-02-27 13:35:08 发布

核心码匠

最新推荐文章于 2023-02-27 13:35:08 发布

阅读量2.4k

点赞数

文章标签： atrous_conv2d tensor

`tf.nn.atrous_conv2d(value, filters, rate, padding, name=None)` {#atrous_conv2d}

Atrous convolution (a.k.a. convolution with holes or dilated convolution).

Computes a 2-D atrous convolution, also known as convolution with holes ordilated convolution, given 4-D value and filters tensors. If the rateparameter is equal to one, it performs regular 2-D convolution. If the rateparameter is greater than one, it performs convolution with holes, samplingthe input values every rate pixels in the height and width dimensions.This is equivalent to convolving the input with a set of upsampled filters,produced by inserting rate - 1 zeros between two consecutive values of thefilters along the height and width dimensions, hence the name atrousconvolution or convolution with holes (the French word trous means holes inEnglish).

More specifically:

output[b, i, j, k] = sum_{di, dj, q} filters[di, dj, q, k] *
      value[b, i + rate * di, j + rate * dj, q]

Atrous convolution allows us to explicitly control how densely to computefeature responses in fully convolutional networks. Used in conjunction withbilinear interpolation, it offers an alternative to conv2d_transpose indense prediction tasks such as semantic image segmentation, optical flowcomputation, or depth estimation. It also allows us to effectively enlargethe field of view of filters without increasing the number of parameters orthe amount of computation.

For a description of atrous convolution and how it can be used for densefeature extraction, please see: Semantic Image Segmentation with DeepConvolutional Nets and Fully Connected CRFs.The same operation is investigated further in Multi-Scale Context Aggregationby Dilated Convolutions. Previous worksthat effectively use atrous convolution in different ways are, among others,OverFeat: Integrated Recognition, Localization and Detection usingConvolutional Networks and Fast ImageScanning with Deep Max-Pooling Convolutional Neural Networks.Atrous convolution is also closely related to the so-called noble identitiesin multi-rate signal processing.

There are many different ways to implement atrous convolution (see the refsabove). The implementation here reduces

    atrous_conv2d(value, filters, rate, padding=padding)

to the following three operations:

    paddings = ...
    net = space_to_batch(value, paddings, block_size=rate)
    net = conv2d(net, filters, strides=[1, 1, 1, 1], padding="VALID")
    crops = ...
    net = batch_to_space(net, crops, block_size=rate)

Advanced usage. Note the following optimization: A sequence of atrous_conv2doperations with identical rate parameters, 'SAME' padding, and filterswith odd heights/ widths:

    net = atrous_conv2d(net, filters1, rate, padding="SAME")
    net = atrous_conv2d(net, filters2, rate, padding="SAME")
    ...
    net = atrous_conv2d(net, filtersK, rate, padding="SAME")

can be equivalently performed cheaper in terms of computation and memory as:

    pad = ...  # padding so that the input dims are multiples of rate
    net = space_to_batch(net, paddings=pad, block_size=rate)
    net = conv2d(net, filters1, strides=[1, 1, 1, 1], padding="SAME")
    net = conv2d(net, filters2, strides=[1, 1, 1, 1], padding="SAME")
    ...
    net = conv2d(net, filtersK, strides=[1, 1, 1, 1], padding="SAME")
    net = batch_to_space(net, crops=pad, block_size=rate)

because a pair of consecutive space_to_batch and batch_to_space ops withthe same block_size cancel out when their respective paddings and cropsinputs are identical.

Args:

value: A 4-D Tensor of type float. It needs to be in the default "NHWC"format. Its shape is [batch, in_height, in_width, in_channels].
filters: A 4-D Tensor with the same type as value and shape[filter_height, filter_width, in_channels, out_channels]. filters'in_channels dimension must match that of value. Atrous convolution isequivalent to standard convolution with upsampled filters with effectiveheight filter_height + (filter_height - 1) * (rate - 1) and effectivewidth filter_width + (filter_width - 1) * (rate - 1), produced byinserting rate - 1 zeros along consecutive elements across thefilters' spatial dimensions.
rate: A positive int32. The stride with which we sample input values acrossthe height and width dimensions. Equivalently, the rate by which weupsample the filter values by inserting zeros across the height andwidth dimensions. In the literature, the same parameter is sometimescalled input stride or dilation.
padding: A string, either 'VALID' or 'SAME'. The padding algorithm.
name: Optional name for the returned tensor.