Deep Learning Meets SAR

最新推荐文章于 2023-09-24 12:45:36 发布

Adam婷

最新推荐文章于 2023-09-24 12:45:36 发布

阅读量7.5k

点赞数 1

分类专栏：论文研读深度学习算法

本文链接：https://blog.csdn.net/weixin_41697507/article/details/107394112

版权

算法同时被 3 个专栏收录

161 篇文章 4 订阅

订阅专栏

深度学习

94 篇文章 2 订阅

订阅专栏

论文研读

38 篇文章 1 订阅

订阅专栏

Deep Learning Meets SAR

Abstract

Deep learning in remote sensing has become an international hype, but it is mostly limited to the evaluation of optical data. Although deep learning has been introduced in SAR data processing, despite successful ﬁrst attempts, its huge potential remains locked. For example, to the best knowledge of the authors, there is no single example of deep learning in SAR that has been developed up to operational processing of big data or integrated into the production chain of any satellite mission. In this paper, we provide an introduction to the most relevant deep learning models and concepts, point out possible pitfalls by analyzing special characteristics of SAR data, review the stateof-the-art of deep learning applied to SAR in depth, summarize available benchmarks, and recommend some important future research directions. With this effort, we hope to stimulate more research in this interesting yet under-exploited research ﬁeld.

Index Terms—Benchmarks, deep learning, despeckling, InSAR, object detection, parameter inversion, SAR, SAR-optical data fusion, terrain surface classiﬁcation.

I. MOTIVATION

In recent years, deep learning [1] has been developed at a dramatic pace, achieving great success in many ﬁelds. Unlike conventional algorithms, deep learning-based methods commonly employ hierarchical architectures, such as deep neural networks, to extract feature representations of raw data for numerous tasks. For instance, convolutional neural networks (CNNs) are capable of learning low- and high-level features from raw images with stacks of convolutional and pooling layers, and then applying the extracted features to various computer vision tasks, such as large-scale image recognition [2], object detection [3], and semantic segmentation [4]. Inspired by numerous successful applications in the computer vision community, the use of deep learning in remote sensing is now obtaining wide attention [5]. As ﬁrst attempts in SAR, deep learning-based methods have been adopted for a variety of tasks, including terrain surface classiﬁcation [6], object detection [7], parameter inversion [8], despeckling [9], speciﬁc applications in InSAR [10], and SAR-optical data fusion [11].

For terrain surface classiﬁcation from SAR and Polarimetric SAR(PolSAR)images,effectivefeatureextractionisessential. These features are extracted based on expert domain knowledge and are usually applicable to a small number of cases and data sets. Deep learning feature extraction has however proved to overcome, to some degrees, both of the aforementioned issues [6]. For SAR target detection, conventional approaches mainly rely on template matching, where speciﬁc templates are created manually [12] to classify different categories, or through the use of traditional machine learning approaches, such as Support Vector Machines (SVMs) [13], [14]; in contrast, modern deep learning algorithms aim at applying deep CNNs to extract discriminative features automatically for target recognition [7]. For parameter inversion, deep learning modelsareemployedtolearnthelatentmappingfunctionfrom SAR images to estimated parameters, e.g., sea ice concentration [8]. Regarding despeckling, conventional methods often rely on artiﬁcial ﬁlters and may suffer from mis-eliminating sharp features when denoising. Furthermore, the development ofjointanalysisofSARandopticalimageshasbeenmotivated by the capacities of extracting features from both types of images. For applications in InSAR, only a few studies have been carried out such as the work described in [10]. However, these algorithms neglect the special characteristics of phase and simply use an out-of-the-box deep learning-based model.

Despite the ﬁrst successes, and unlike the evaluation of optical data, the huge potential of deep learning in SAR and InSAR remains locked. For example, to the best knowledge of the authors, there is no single example of deep learning in SAR that has been developed up to operational processing of big data or integrated into the production chain of any satellite mission. This paper aims at stimulating more research in this interesting yet under-exploited research ﬁeld.

In the remainder of this paper, Section II ﬁrst introduces the most commonly used deep learning models in remote sensing. Section III describes the speciﬁc characteristics of SAR data that have to be taken into account to exploit the full potential of SAR combined with deep learning. Section IV details recent advances in the utilization of deep learning on different SAR applications, which were outlined earlier in the section. Section V reviews the existing benchmark data sets for different applications of SAR and their limitations. Finally, Section VI concludes current research, and gives an overview of promising future directions.

在这里插入图片描述

II. INTRODUCTION TO RELEVANT DEEP LEARNING MODELS AND CONCEPTS

In this section, we brieﬂy review relevant deep learning algorithmsoriginallyproposedforvisualdataprocessingthatare widely used for the state-of-the-art research of deep learning in SAR. In addition, we mention the latest developments of deep learning, which are not yet widely applied to SAR but may help create next generation of its algorithms. Fig. 1 gives an overview of the deep learning models we discuss in this section. Before discussing deep learning algorithms, we would like to stress that the importance of high-quality benchmark datasets in deep learning research cannot be overstated. Especially in supervised learning, the knowledge that can be learned by the model is bounded by the information present in the training dataset. For example, the MNIST [25] dataset played a key role in Yann LeCun’s seminal paper about convolutional neural networks and gradient-based learning [26]. Similarly, there would be no AlexNet [27], the network that kick-started the current deep learning renaissance, without the ImageNet [28] dataset, which contains over 14 million images and 22,000 classes. ImageNet has been such an important part of deep learning research that, even after over 10 years of being published, it is still used as a standard benchmark to evaluate the performance of CNNs for image classiﬁcation.

A. Deep Learning Models

The main principle of deep learning models is to encode input data into effective feature representations for target tasks.

To examplify how a deep learning framework works, we take autoencoder as an example: it ﬁrst maps an input data to a latent representation via a trainable nonlinear mapping and then reconstructs inputs through reverse mapping. The reconstruction erroris usually deﬁned as the Euclidian distance between inputs and reconstructed inputs.Parameters of autoencoders are optimized by gradient descent based optimizers,like stochastic gradient descent (SGD), RMSProp [29] and ADAM [30], during the backpropagation step.

Convolutional Neural Networks (CNN): With the success of AlexNet in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC-2012), where it scored a top-5 test error of 15.3% compared to 26.2% of the second best, CNNs have attracted worldwide attention and are now used for many image understanding tasks, such as image classiﬁcation, object detection, and semantic segmentation. AlexNet consists of ﬁve convolutional layers, three max-pooling layers, and three fully-connected layers. One of the key innovations of the AlexNet was the use of GPUs, which made it possible to train such large networks with huge datasets without using supercomputers. In just two years, VGGNet [2] overtook AlexNet in performance by achieving a 6.8% top-5 test error in ILSVRC-2014; the main difference was that it only used 3x3-sized convolutional kernels, which enabled it to have more number of channels and in turn capture more diverse features. ResNet [31], U-Net [32], and DenseNet [33] were the next major CNN architectures. The main feature of all these architectures was the idea of connecting, not only neighboring layers but any two layers in the network, by using skip connections. This helped reduce loss of information across networks, mitigated the problem of vanishing gradients and allowed the design of deeper networks. U-Net is one of the most commonly used image segmentation networks. It has autoencoder based architecture where it uses skip connections to concatenate features from the ﬁrst layer to last, second to second last, and so on: this way it can get ﬁne-grained information from initial layers to the end layers. U-Net was initially proposed for medical image segmentation, where data labeling is a big problem. The authors used heavy data augmentation techniques on input data, making it possible to learn from only a few hundred annotated samples. In ResNet skip connections were used within individual blocks and not across the whole network. Since its initial proposal, it has seen many architectural tweaks, and even after 4-5 years its variants are always among the top scorers on ImageNet. In DenseNet all the layers were attached to all preceding layers, reducing the size of the network, albeit at the cost of memory usage. For a more detailed explanations of different CNN models, interested readers are referred to [34].
Recurrent Neural Networks (RNN): Besides CNNs, RNNs [35] are another major class of deep networks. Their mainbuildingblocksarerecurrentunits,whichtakethecurrent input and output of the previous state as input. They provide state-of-the-art results for processing data of variable lengths like text and time series data. Their weights can be replaced with convolutional kernels for visual processing tasks such as image captioning and predicting future frames/points in visual time-series data. Long short term memory (LSTM) [36] is one of the most popular architectures of RNN: its cells can store values from any past instances while not being severely affected by the problem of gradient diminishing… 3) GANs: Proposed by Ian Goodfellow et al. [37], GANs are among the most popular and exciting inventions in the ﬁeld of deep learning. Based on game-theoretic principles, they consist of two networks called a generator and a discriminator. The generator’s objective is to learn a latent space, through which it can generate samples from the same distribution as the training data, while the discriminator tries to learn to distinguish if a sample is from the generator or training data. This very simple mechanism is responsible for most cuttingedge algorithms of various applications, e.g., generating artiﬁcial photo-realistic images/videos, super-resolution, and text to image synthesis.

B. Supervised, Unsupervised and Reinforcement Learning

Supervised Learning: Most of popular deep learning models fall under the category of supervised deep learning, i.e. they need labelled datasets to learn the objective functions. One of big challenges of supervised learning is generalization, i.e. how well a trained model performs on test data. Therefore it is vital that training data truly represents the true distribution of data so it can handle all the unseen data. If the model ﬁts well on training data and fails on test data then it is called overﬁtting, in deep learning literature there are several techniques that can be used to avoid it, e.g. Dropout[38].
Unsupervised Learning: Unsupervised learning refers to the class of algorithms where the training data do not containlabels.Forinstance,inclassicaldataanalysis,principal component analysis (PCA) [39] can be used to reduce the data dimension followed by a clustering algorithm to group similar data points. In deep learning generative models like autoencoders and variational autoencoders (VAEs) [40] and Generative Adversarial Networks (GANs) [37] are some of popular techniques that can be used for unsupervised learning. Their primary goal is to generate output data from the same distribution as input data. Autoencoders consists of an encoder part which ﬁnds compressed latent representation of input and a decoder part which decodes that representation back to the original input. VAEs take autoencoders to the next level by learning the whole distribution instead of just a single representation at the end of the encoder part, which in turn can be used by the decoder to generate the whole distribution of outputs. The trick to learning this distribution is to also learn variance along with mean of latent representation at the encoder-decoder meeting point and add a KL-divergencebased loss term to the standard reconstruction loss function of the autoencoders.
Deep Reinforcement Learning (DeepRL): Reinforcement Learning (RL) tries to mimic the human learning behavior, i.e., taking actions and then adjusting them for the future according to feedback from the environment. For example, young children learn to repeat or not repeat their actions based on the reaction of their parents. The RL model consists of an environment with states, actions to transition between those states, and a reward system for ending up in different states. The objective of the algorithm is to learn the best actions for given states using a feedback reward system. In a classical RL algorithms function, approximators are used to calculate the probability of different actions in different states. DeepRL uses different types of neural networks to create these functions [41][42]. Recently DeepRL received particular attention and popularity due to the success of Google Deep Mind’s AlphaGo [43], which defeated the Go board game world champion. This task was considered impossible by computers just until a few years ago.

C. Relevant Deep Learning Concepts

Automatic Machine Learning (AutoML): Deep networks have many hyperparameters to choose from, for example, number of layers, kernel sizes, type of optimizer, skip connections, and the like. There are billions of possible combinations of these parameters and given high computational cost, time, and energy costs it is hard to ﬁnd the best performing network even from among a few hundred candidates. In the case of deep learning, the objective of AutoML is mainly to ﬁnd the most efﬁcient and high performing deep network for a given dataset and task. The ﬁrst major attempt in this ﬁeld was by Zoph et al. [44], who used DeepRL to ﬁnd the optimum CNN for image classiﬁcation. In their system an RNN creates CNN architectures and, based on their classiﬁcation results, proposes changes to them. This process continues to loop until the optimum architecture is found. This algorithm was able to ﬁnd competing networks compared to the state-of-the-art but took over 800 GPUs, which was unrealistic for practical application.Recently,therehavebeenmanynewdevelopments in the AutoML ﬁeld, which have made it possible to perform such tasks in more intelligent and efﬁcient ways. More details about the ﬁeld of network architectural search can be found in [45].
Geometric Deep Learning – Graph Neural Networks (GNNs): Exceptforwell-structuredimagedata,thereisalarge amount of unstructured data, e.g., knowledge graphs and social networks, in real life that cannot be directly processed by a deep CNN. Usually, these data are represented in the form of graphs, where each node represents an entity and edges delineate their mutual relations. To learn from unstructured data, geometric deep learning has been attracting an increasing attention, and a most-commonly used architecture is GNN, which is also proven successful in dealing with structured data. Speciﬁcally, Using the terminology of graphs, nodes of a graph can be regarded as feature descriptions of entities, and their edges are established by measuring their relations or distances and encoded in an adjacency matrix. Once a graph is constructed, messages can be propagated among each node by simply performing matrix multiplication. Followingly, [46] proposed Graph Convolutional Networks (GCNs) characterized by utilizing graph convolutions, and [45] fasten the process. Moreover recurrent units in RGNNs (Recurrent Graph Neural Network) [47] [48] have also been proven to obtain achievements in learning from graphs.

III. POSSIBLE PITFALLS

To develop tailored deep learning architectures and prepare suitable training datasets for SAR or InSAR tasks, it is important to understand that SAR data is different from optical remote sensing data, not to mention images downloaded from the internet. In this section, we discuss the special characteristics (and possible pitfalls) encountered while applying deep learning to SAR. What makes SAR data and SAR data processing by neural networks unique? SAR data are substantially different from optical imagery in many respects. These are a few points to be considered when transferring CNN experience and expertise from optical to SAR data:
Dynamic Range. Depending on their spatial resolution, the dynamic range of SAR images can be up to 90 dB (TerraSAR-X high resolution spotlight data with a resolution of about 1 m). Moreover, the distribution is extremely asymmetric, with the majority of pixels in the low amplitude range (distributed scatterers) and a long tail representing bright discrete scatterers, in particular in urban areas. Standard CNNs are not able to handle such dynamic ranges and, hence, most approaches feature dynamic compression as a preprocessing step. In [49], the authors ﬁrst take only amplitude values from 0 to 255 and then subtract mean values of each image. In [11], [50], normalization is performed as a pre-processing step, which compresses the dynamic range signiﬁcantly.

• Signal Statistics. In order to retrieve features from SAR (amplitude or intensity) images the speckle statistics must be considered. Speckle is a multiplicative, rather than an additive, phenomenon. This has consequences: While the optimum estimator of radar brightness of a homogeneous image patch under speckle is a simple moving averaging operation (i.e., a convolution, like in the additive noise case), other optimum detectors of edges and low-level features under additive Gaussian noise may no longer be optimum in the case of SAR. A popular example is Touzi’s CFAR edge detector [51] for SAR images, which uses the ratio of two spatial averages over adjacent windows. This operation cannot be emulated by the ﬁrst layer of a standard CNN. Some studies use a logarithmic mapping of the SAR images prior to feeding them into a CNN [52], [9]. This turns speckle into an additive random variable and —as a side effect —reduces dynamic range. But still, a single convolutional layer can only emulate approximations to optimum SAR feature estimators. It could be valuable to supplement the original log-SAR image by a few lowpass ﬁltered and logarithmized versions as input to the CNN. Another approach is to apply some sophisticated speckle reduction ﬁlter before entering the CNN, e.g., non-local averaging [53], [54], [55].

Imaging Geometry.
The SAR image coordinates range and azimuth are not arbitrary coordinates like East and North or x and y, but rather reﬂect the peculiarities of the image generation process. Layover always occurs at near range shadow always at far range of an object. That means, that data augmentation by rotation of SAR images would lead to nonsense imagery that would never be generated by a SAR.
The Complex Nature of SAR Data.
The most valuable information of SAR data lies in its phase. This applies for SAR image formation, which takes place in the complex signal domain, as well as for polarimetric, interferometric (InSAR), and tomographic SAR data processing. This means that the entire CNN must be able to handle complex numbers. For the convolution operation this is trivial. The nonlinear activation function and the loss function, however, require thorough consideration. Depending on whether the activation function acts on the real and imaginary parts of the signal independently, or only on its magnitude, and where bias is added, phase will be distorted to different degrees.

If we use polarimetric SAR data for land cover or target classiﬁcation, a nonlinear processing of the phase is even desirable, because the phase between different polarimetric channels has physical meaning and, hence, contributes to the classiﬁcation process. In SAR interferometry and tomography, however, the absolute phase has no meaning, i.e., the CNN must be invariant to an arbitrary phase offset. Assume some interferometric input signal x to a CNN and the output

signal CNN(x) with phase

在这里插入图片描述
This linearity is violated, for example, if the activation function is applied to real and imaginary parts separately, or if a bias is added to the complex numbers. Another point to consider in regression-type InSAR CNN processing (e.g., for noise reduction) is the loss function. If the quantity of interest is not the complex number itself, but its phase, the loss function must be able to handle the cyclic nature of phases. It may also be advantageous that the loss function is independent—at least to a certain degree —of the signal magnitude to relieve the CNN from modelling the magnitude. A loss function that meets these requirements is, for example,
在这里插入图片描述

Simulation-based Training and Validation Data?

The prevailing lack of ground-truth for regression-type tasks, like speckle reduction or inSAR denoising, might tempt us to use simulated SAR data for training and validation of neural networks. However, this bears the risk that our networks will learn models that are far too simpliﬁed. Unlike in the optical imaging ﬁeld, where highly realistic scenes can be simulated, e.g. by PC games, the simulation of SAR data is more a scientiﬁc topic without the power of commercial companies and a huge market. SAR simulators focus on speciﬁc scenarios, e.g. vegetation (only distributed scatterers considered) or persistent (point) scatterers. The most advanced simulators are probably the ones for computing radar backscatter signatures of single military objects, like vessels. To our knowledge though there is no simulator available that can , e.g., generate realistic interferometric data of rugged terrain with layover, spatially varying coherence, and diverse scattering mechanisms. Often simpliﬁed scattering assumptions are made, e.g. that speckle is multiplicative. Even this is not true; pure Gaussian scattering can only be found for quite homogeneous surfaces and low resolution SARs. As soon as the resolution increases chances for a few dominating scatterers in a resolution cell increase as well and the statistic become substantial different from one of fully developed speckle

IV. RECENT ADVANCES IN DEEP LEARNING APPLIED TO SAR

In this section, we provide an in-depth review of deep learning methods applied to SAR data from six perspectives: terrain surface classiﬁcation, object detection, parameter inversion, despeckling, SAR Interferometry (InSAR), and SARoptical data fusion. For each application, notable developments are stated in the chronological order, and their advantages and disadvantages are reported. Finally, each subsection is concluded with a brief summary.

A. Terrain Surface Classiﬁcation

As an important direction of SAR applications, terrain surface classiﬁcation using PolSAR images is rapidly advancing with the help of deep learning. Regarding feature extraction, most conventional methods rely on exploring physical scattering properties [56] and texture information [57] in SAR images. However, these features are mainly human designed based on speciﬁc problems and characteristics of data sources. Compared to conventional methods, deep learning is superior in terrain surface classiﬁcation due to its capability of automatically learning discriminative features. Moreover, deep learning approaches, such as CNNs, can effectively extract not only polarimetric characteristics but also spatial patterns of PolSAR images [6]. Some of the most notable deep learning techniques for PolSAR image classiﬁcation are reviewed in the following.
Xie et al. [58] ﬁrst applied deep learning to terrain surface classiﬁcation using PolSAR images. They employed a stacked auto encoder (SAE) to automatically learn deep features from PolSAR data and then fed them to a softmax classiﬁer. Remarkable improvements in both classiﬁcation accuracy and visual effect proved that this method can effectively learn a comprehensive feature representation for classiﬁcation purposes. Instead of simply applying SAE, Geng et al. [61] proposed a deep convolutional autoencoder (DCAE) for automatically extracting features and performing classiﬁcation. The ﬁrst layer of DCAE is a hand-crafted convolutional layer, where ﬁlters are pre-deﬁned, such as gray-level co-occurrence matrices and Gabor ﬁlters. The second layer of DCAE performs a scale transformation, which integrates correlated neighbor pixels to reduce speckle. Following these two hand-crafted layers, a trained SAE, which is similar to [58], is attached for learning more abstract features. Tested on high-resolution single-polarization TerraSAR-X images, the method achieved remarkable classiﬁcation accuracy. Based on DCAE, Geng et al. [59] proposed a framework, called deep supervised and contractive neural network (DSCNN), for SAR image classiﬁcation, which introduces histogram of oriented gradient (HOG) descriptors. In addition, a supervised penalty is designed to capture relevant information between features and labels, and a contractive restriction, which can enhance local invariance, is employed in the following trainable autoencoder layers. An example of applying DSCNN on TerraSAR-X data from a small area in Norway is seen in Fig. 2. Compared to other algorithms, the capability of DSCNN to achieve a highly accurate and noise free classiﬁcation map is observed.

在这里插入图片描述
Fig. 2: Classiﬁcation maps obtained from a TerraSAR-X image of a small area in Norway [59]. Subﬁgures (a)-(f) depict the results of classiﬁcation using SVM (accuracy = 78.42%), sparse representation classiﬁer (SRC) (accuracy = 85.61%), random forest (accuracy = 82.20%) [60], SAE (accuracy = 87.26%) [58], DCAE (accuracy = 94.57%) [61], contractive AE (accuracy = 88.74). Subﬁgures (g)-(i) show the combination of DSCNN with SVM (accuracy = 96.98%), with SRC (accuracy = 92.51%) [62], and with random forest (accuracy = 96.87%). Subﬁgures (j) and (k) represent the classiﬁcation results of DSCNN (accuracy = 97.09%) and DSCNN followed by spatial regularization (accuracy = 97.53%), which achieve higher accuracy than the other methods.

In addition to the aforementioned methods, many studies integrate SAE models with conventional classiﬁcation algorithms for terrain surface classiﬁcation. Hou et al. [64] proposed an SAE combined with superpixel for PolSAR image classiﬁcation. Multiple layers of the SAE are trained on a pixel-by-pixel basis. Superpixels are formed based on Paulidecomposed pseudo-color images. Outputs of the SAE are used as features in the ﬁnal step of k-nearest neighbor clustering of superpixels. Zhang et al. [65] applied stacked sparse AE to PolSAR image classiﬁcation by taking into account local spatial information. Qin et al. [66] applied adaptive boosting of RBMs to PolSAR image classiﬁcation. Zhao et al. [67] proposed a discriminant DBN (DisDBN) for SAR image classiﬁcation, in which discriminant features are learned by combining ensemble learning with a deep belief network in an unsupervised manner. Moreover, taking into account that most current deep learning methods aim at exploiting features either from polarization information or spatial information of PolSAR images, Gao et al. [63] proposed a dual-branch CNN to learn features from both perspectives for terrain surface classiﬁcation. This method is built on two feature extraction channels: one to extract polarization features from the 6-channel real matrix, and the other to extract spatial features of a Pauli decomposition. Next the extracted features are combined using two parallel fully connected layers, and ﬁnally fed to a softmax layer for classiﬁcation. The detailed architecture of this network is illustrated in Fig. 3. Different variations of CNNs have been used for terrain surface classiﬁcation as well. In [68], Zhou et al. ﬁrst extracted a 6-channel covariance matrix and then fed it to a trainable CNN for PolSAR image classiﬁcation. Wang et al. [69] proposed a fully convolutional network integrated with sparse and low-rank subspace representations for classifying PolSAR images. Chen et al. [70] improved CNN performances by incorporatingexpertknowledgeoftargetscatteringmechanism interpretationandpolarimetricfeaturemining.Inamorerecent work [71], He et al. proposed the combination of features learned from nonlinear manifold embedding and applying a fully convolutional network (FCN) on input PolSAR images; the ﬁnal classiﬁcation was carried out in an ensemble approach by SVM. In [72], the authors focused on the computational efﬁciency of deep learning methods, proposing the use of lightweight 3D CNNs. They showed that classiﬁcation accuracy comparable to other CNN methods was achievable while signiﬁcantly reducing the number of learned parameters and therefore gaining computational efﬁciency. Apart from these single-image classiﬁcation schemes using CNN, the use of time series of SAR images for crop classiﬁcation has been shown in [73], [74]. The authors of both papers experimented with using Recurrent Neural Network (RNN)based architectures to exploit the temporal dependency of multi-temporalSARimagestoimproveclassiﬁcationaccuracy. A unique approach for tackling PolSAR classiﬁcation was recently proposed in [75], where for the ﬁrst time the authors utilized an AutoML technique to ﬁnd the optimum CNN architecture for each dataset. The approach takes into account the complex nature of PolSAR images, is cost effective, and achieves high classiﬁcation accuracy [75].

在这里插入图片描述
Most of the aforementioned methods rely primarily on preprocessing or transforming raw complex-valued data into features in the real domain and then inputting them in a common CNN, which constrains the possibility of directly learning features from raw data. To tackle this problem, Zhang et al. [76] proposed a novel complex-valued CNN (CV-CNN) speciﬁcally designed to process complex values in PolSAR data, i.e., the off-diagonal elements of a coherency or covariance matrix. The CV-CNN not only takes complex numbers as input but also employs complex weights and complex operations throughout different layers. A complexvalued backpropagation algorithm is also developed for CVCNN training. Other notable complex-valued deep learning approaches for classiﬁcation using PolSAR images can be found in [77], [78], [79]. Although not completely related to terrain surface classiﬁcation, it is also worth mentioning that the combination of SAR and PolSAR images with feed-forward neural networks has been extensively used for sea ice classiﬁcation. This topic is not treated any further in this section and the interested reader is referred to consult [80], [81], [82], [83], [84] for more information. Similar to the polarimetric signature, InSAR coherence provides information about physical scattering properties. In [85] interferometric volume decorrelation is used as a feature for forest/non-forest mapping together with radar backscatter and incidence angle. The authors used bistatic TanDEM-X data where temporal decorrelation can be neglected. They compared different architectures and concluded that CNNs outperform random forest and U-Net proved best for this segmentation task.

To summarize, it is apparent that deep learning-based SAR and PolSAR classiﬁcation algorithms have advanced considerably in the past few years. Although at ﬁrst the focus was based on low-rank representation learning using SAE [58] and its modiﬁcations [61], later research focused on a multitude of issues relevant to SAR imagery, such as taking into account speckle [61], [59] preserving spatial structures [63] and their complex nature [76], [77], [78]. It can also be seen that the challenge of the scarcity of labeled data has driven researchers to use semi-supervised learning algorithms [79]. Finally, one of machine learning’s important ﬁelds, AutoML, a ﬁeld that had not been exploited extensively by the remote sensing community, has found its application for PolSAR image classiﬁcation [75].

B. Object Detection

Although various characteristics distinguish SAR images from optical RGB images,the SAR object detection problem is still analogous to optical image classiﬁcation and segmentation in the sense that feature extraction from raw data is always the prior and crucial step. Hence, given success in the optical domain, there is no doubt that deep learning is one of the most promising ways to develop the state-of-the-art SAR object detection algorithms. The majority of earlier works on SAR object detection using deep learning consists of taking successful deep learning methods for optical object detection and applying them with minor tweaks to military vehicle detection (MSTAR dataset; see subsection V-C) or ship detection on custom datasets. Even small-sized networks are easily able to achieve more than 90% test accuracy on most of these tasks. The ﬁrst attempt in military vehicle detection can be found in [7], where Chen et al. used an unsupervised sparse autoencoder to generate convolution kernels from random patches of a given input for a single-layer CNN, which generated features to train a softmax classiﬁer for classifying military targets in the MSTAR dataset [87]. The experiments in [7] showed great potential for applying CNNs to SAR target recognition. With this discovery, Chen et al. [88] proposed A-ConvNets, a simple 5-layer CNN that was able to achieve state-of-the-art accuracy of about 99% on the MSTAR dataset.
在这里插入图片描述
Following this trend, more and more authors applied CNNs to the MSTAR dataset [89], [90], [91]. Morgan [89] successfully applied a modestly sized 3-layered CNN on MSTAR and building upon it Wilmanski et al. [92] investigated the effects of initialization and optimizer selection on ﬁnal results. Ding et al. [90] investigated the capabilities of a CNN model combined with domain-speciﬁc data augmentation techniques (e.g., pose synthesis and speckle adding) in SAR object detection. Furthermore, Du et al. [91] proposed a displacement- and rotation-insensitive CNN, and claimed that data augmentation on training samples is necessary and critical in the preprocessing stage. On the same dataset, instead of treating CNN as an endto-end model, Wagner [93] and similarly Gao [94] integrated CNN and SVM, by ﬁrst using a CNN to extract features, and then feeding them to an SVM for ﬁnal prediction. Speciﬁcally, Gao et al. [95] added a class of separation information to the cross-entropy cost function as a regularization term, which they show explicitly facilitates intra-class compactness and separtability, in turn improving the quality of extracted features. More recently, Furukawa [96] proposed VersNet, an encoder-decoder style segmentation network, to not only identify but also localize multiple objects in an input SAR image. Moreover, Zhang et al. [86] proposed an approach based on multi-aspect image sequences as a pre-processing step. In the contribution, they are taking into account backscattering signals from different viewing geometries, following feature extraction using Gabor ﬁlters, dimensionallity reduction and eventually feeding the results to a Bidirectional LSTM model for joint recognition of targets. The ﬂowchart of this SAR ATR framework is illustrated in Fig. 4. Besides truck detection, ship detection is another tackled SAR object detection task. Early studies on applying deep

learning models to ship detection [97], [98], [99], [100], [101] mainly consist of two stages: ﬁrst cropping patches from the whole SAR image and then identifying whether cropped patches belong to target objects using a CNN. Because of ﬁxed patch sizes these methods were not robust enough to cater for variations in ship geometry, like size and shape. This problem was overcome by using region-based CNNs [102], [103], with creative use of skip connections and feature fusion techniques in later literature. For example, Li et al. [104] fuses features of the last three convolution layers before feeding them to a region proposal network (RPN). Kang et al. [105] proposed a contextual region based network that fuses features from different levels. Meanwhile, to make the most use of features of different resolution, Jiao et al. [106] densely connected each layertoitssubsequentlayersandfedfeaturesfromalllayersto separate RPN to generat proposals; in the end the best proposal was chosen based on an intersection-over-union score. In more recent works on SAR object detection, scientists have tried to explore many other interesting ideas to complementcurrentworks.Dechesneetal.[107]proposedamultitask network that simultaneously learned to detect, classify, and estimate the length of ships. Mullissa et al. [108] showed that CNNs can be trained directly on Complex-Valued SAR data; Kazemi et al. [109] performed object classiﬁcation using an RNN based architecture directly on received SAR signal instead of processed SAR images; and Rostami et al. [110] and Huang et al. [111] explored knowledge transfer or transfer learning from other domains to the SAR domain for SAR object detection. Perhaps one of the more interesting recent works in this application area is building detection by Shahzad et al. [112]. They tackle the problem of Very High Resolution (VHR) SAR building detection using a FCN [113] architecture for feature extraction, followed by CRF-RNN [114], which helps give similar weights to neighboring pixels. This architecture produced building segmentation masks with up to 93% accuracy. An example of the detected buildings can be seen in Fig. 5, where the left subﬁgure is the amplitude of the input TerraSAR-X image of Berlin, and the right subﬁgure is the predicted building mask. Another major contribution made in that paper addresses the problem of lack of training data by introducing an automatic annotation technique, which annotates the TomoSAR data using Open Street Map (OSM) data. In summary, deep learning faces challenges on two fronts when applied to SAR object detection tasks. The ﬁrst is the challenge of identifying characteristics of SAR imagery like imaging geometry, size of objects, and speckle noise. The second and bigger challenge is the lack of good quality standardized datasets. As we observed, the most popular dataset, MSTAR, is too easy for deep nets and for ship detection, majority of authors created their own datasets, which makes it very hard to judge the quality of the proposed algorithms and even harder to compare different algorithms.

在这里插入图片描述

C. Parameter Inversion

Parameter inversion from SAR images is a challenging ﬁeld in SAR applications. As one important branch, ice concentration estimation is now attracting great attention due to its importance to ice monitoring and climate research [115]. Since there are complex interactions between SAR signals and sea ice [116], empirical algorithms face difﬁculties with interpreting SAR images for accurate ice concentration estimation. Wang et al. [8] resorted to a CNN for generating ice concentration maps from dual polarized SAR images. Their method takes image patches of the intensity-scaled dual band SAR images as inputs, and outputs ice concentration directly. In [117], [118], Wang et al. employed various CNN models to estimate ice concentration from SAR images during the melt season. Labels are produced by ice experts via visual interpretation. The algorithm was tested on dual-pol RadarSat2 data. Since the problem considered is the regression of a continuous value, mean squared error is selected as the loss function. Experimental results demonstrate that CNNs can offer a more accurate result than comparative operational products. In a different application, Song et al. used a deep CNN, including ﬁve pairs of convolutional and max pooling layers followed by two fully connected layers for inverting rough surfaceparametersfromSARimages[121].Thetrainingofthe network was based on simulated data solely due to the scarcity of real training data. The method was able to invert the desired parameters with a reasonable accuracy and the authors showed that training a CNN for parameter inversion purposes could be done quite efﬁciently. Furthermore, Zhao et al. [122] designed a complex-valued CNN to directly learn physical scattering signatures from PolSAR images. The authors have notably proposed a framework to automatically generate labeled data, which led to an unsupervised learning algorithm for the aforementioned parameter inversion. On the whole, deep learning-based parameter estimation for SAR applications has not yet been fully exploited. Unfortunately, most of the focus of the remote sensing community has been devoted to classical problems, which overlap with computer vision tasks such as classiﬁcation, object detection, segmentation, and denoising. We hope that in the future more studies will be carried out to employ deep learning methods for geophysical and other parameter inversion tasks using SAR data.

D. Despeckling

Speckle, caused by the coherent interaction among scattered signals from sub-resolution objects, often makes processing andinterpretationofSARimagesdifﬁcult.Therefore,despeckling is a crucial procedure before applying SAR images to various tasks. Conventional methods aim at removing speckle either spatially, where local spatial ﬁlters, such as the Lee ﬁlter [123], Kuan ﬁlter [124], and Frost ﬁlter [125], are employed, or using wavelet-based methods [126], [127], [128]. For a full overview of these techniques, the reader is referred to [129]. In the past decade, patch-based methods for speckle reduction have gained high popularity due to their ability to preserve spatial features while not sacriﬁcing image resolution [130]. Deledalle et al. [131] proposed one of the ﬁrst nonlocal patchbased methods applied to speckle reduction by taking into account the statistical properties of speckle combined with the original nonlocal image denoising algorithm introduced in [132]. A vast number of variations of the nonlocal method for SAR despeckling has been proposed, with the most notable ones included in [133], [134]. However, on one hand, manual selection of appropriate parameters for conventional algorithms is not easy and is sensitive to reference images. On the other hand, it is difﬁcult to achieve a balance between preserving distinct image features and removing artifacts with empirical despeckling methods. To solve these limitations, methods based on deep learning have been developed.
在这里插入图片描述

在这里插入图片描述
Inspired by the success of image denoising using a residual learning network architecture in the computer vision community [135], Chierchia et al. [52] ﬁrst introduced a residual learning CNN for SAR image despeckling by presenting a 17-layered CNN for learning to subtract speckle components from noisy images. Considering that speckle noise is assumed to be multiplicative, the homomorphic approach with coupled log- and exp-transformations is performed before and after feeding images to the network. In this case, multiplicative speckle noise is transformed into an additive form and can be recovered by residual learning, where log-speckle noise is regarded as residual. As shown in Fig. 6, an input log-noisy image is mapped identically to a fusion layer via a shortcut connection, and then added element-wise with the learned residual image to produce a log-clean image. Afterwards, denoised images can be obtained by an exp-transformation. Wang et al. [9] proposed a CNN, called ID-CNN, for image despeckling, which can directly learn denoised images via a component-wise division-residual layer with skip connections. In another words, homomorphic processing is not introduced for transforming multiplicative noise into additive noise and at a ﬁnal stage the noisy image is divided by the learned noise to yield the clean image.

As a step forward with respect to the two aforementioned residual-based learning methods, Zhang et al. [136] employed a dilated residual network, SAR-DRN,instead of simply stacking convolutional layers. Unlike [52] and similar to [9], SARDRN is trained in an end-to-end fashion using a combination of dilated convolutions and skip connections with a residual learning structure, which indicates that prior knowledge such as a noise description model is not required in the workﬂow. In [137], Yue et al. proposed a novel deep neural network architecture speciﬁcally designed for SAR despeckling. It used a convolutional neural network to extract image features and reconstruct a discrete RCS probability density function (PDF). It is trained by a hybrid loss function which measures the distance between the actual SAR image intensity PDF and the estimated one that is derived from convolution between the reconstructed RCS PDF and prior speckle PDF. Experimental results demonstrated that the proposed despeckling neural network can achieve comparable performance as non-learning state-of-the-art methods. In [49], the problem of despeckling was tackled by a time series of images. Using as tack of images for despeckling is not unique to deep learning-based methods, as has been recently demonstrated in [138] as well. In [49] the authors utilized a multi-layer perceptron with several hidden layers to learn non-linear intensity characteristics of training image patches. This approach has shown promising results and reported comparative performance with the state-of-theart despeckling algorithms. Again using single images instead of time series, in [139] the authors proposed a deep encoder–decoder CNN architecture with focus on feature preservation, which is a weakness of CNNs. They modiﬁed U-Net [] in order to accommodate speckle statistical features. Another notable CNN approach was introduced in [120], where the authors used a nonlocal structure, while the weights for pixel-wise similarity measures were assigned using a CNN. The results of this approach, called CNN-NLM, are reported in Fig. 7, where the superiority of the method with respect to both feature preservation and speckle reduction is clearly observed. From the deep learning-based despeckling methods reviewed in this subsection,it can be observed that most methods employ CNN-based architectures with single images of the scene for training; they either output the clean image in an end-to-end fashion or propose residual-based techniques to learn the underlying noise model. With the availability of large archives of time series thanks to the Sentinel-1 mission, an interesting direction is to exploit the temporal correlation of speckle characteristics for despeckling applications. Another problem in supervised deep learning-based despeckling techniques is the lack of ground truth data. In many studies, the training data set is built by corrupting optical images by multiplicative noise. This is far from realistic for despeckling applied to real SAR data. Therefore, despeckling in an unsupervised manner would be highly desirable and worth attention.

E. InSAR

Interferometric SAR (InSAR) is one of the most important SAR techniques, and is widely used in reconstructing the topography of the Earth’s surface, i.e., digital elevation model (DEM) generation [140], [141], [56], and detecting topographical displacements, e.g., monitoring volcanic eruptions [142], [143], [144], earthquakes [145], [146], land subsidence [147], and urban areas using time series methods [148], [149], [150]. The principle of InSAR is to ﬁrst measure the interferometric phase between signals received by two antennas located at different positions and then extract topographic information from the obtained interferogram by unwrapping and converting the absolute phase to height. However, an actual interferogram often suffers from a large number of singular points, which originate from the interference distortion and noise in radar measurements. These points result in unwrapping errors and consequently low quality DEMs. To tackle this problem, Ichikawa and Hirose [151] applied a complex-valued neural network, CVNN, in the spectral domain to restore singular points. With the help of the Complex Markov Random Field (CMRF) ﬁlter [152], they aimed at learning ideal relationships between the spectrum of neighboring pixels and that of center pixels via a one-hidden-layer CVNN. Notably, center pixels of each training sample are supposed to be ideal points, which indicate that singular points are not fed to the network during the training procedure. Similarly, Oyama and Hirose [153] restored singular points with a CVNN in the spectrum domain. Related to topography extraction, Costante et al. [155] proposed a fully CNN Encoder-Decoder architecture for estimating DEM from single-pass image acquisitions. It is demonstrated that this model is capable of extracting highlevel features from input radar images using an encoder section and then reconstructing full resolution DEM via a decoder section. Moreover, the network can potentially solve the layover phenomenon in one single-look SAR image with contextual features. In addition to reconstructing DEMs, Schwegmann et al. [156] presented a CNN-based technique to detect subsidence deformations from interferograms. They employed a 9-layer network to extract salient information in interferograms and displacement maps for discriminating deformation targets from deformation-like targets. Furthermore, Anantrasirichai et al. [10], [157], [158] used a pre-trained CNN to automatically detect volcanic ground deformation from InSAR images. They divided each image into patches, and relabeled them with binary labels, i.e., ”background” and ”volcano”, and ﬁnally fed them to the network to predict volcano deformation. They further improved their method to be able to detect slowmoving volcanoes by using a time series of interferograms in [159]. In another study related to automatic volcanic deformation detection, Valade et al. [154] designed and trained a CNN from scratch to learn a decorrelation mask from input wrapped interferograms, which then was used to detect volcanic ground deformation. The ﬂowchart of this approach can be seen in Fig. 8. The training in both of the aforementioned works [159], [154] was based on simulated data. Another geophysically motivated example of using deep learning on InSAR data, which was actually proposed earlier than the above-mentioned CNN-based studies, was seen in [160], [161], [162], where the authors used simple feed-forward shallow neural networks for seismic event characterization and automatic seismic source
parameter inversion by exploiting the power of neural networks in solving non-linear problems.
在这里插入图片描述
Fig. 8: The workﬂow of volcano deformation detection proposed in [154]. The CNN is trained on simulated data and is later used to detect phase gradients and a decorrelation mask from input wrapped interferograms to locate ground deformation caused by volcanoes.

In summary, it can be concluded that the use of deep learning methods in InSAR is still at a very early stage. Although deep learning has been used in different applications combined with InSAR, the full potential of interferograms is not yet fully exploited except in the pioneering work of Hirose [163]. Many applications treat interferograms or deformation maps obtained from interferograms as images similar to RGB or gray-scale ones and therefore the complex nature of interferograms has remained unnoticed. Apart from this issue, like the SAR despeckling problem using deep learning,lackofgroundtruthdataforeitherdetectionorimage restoration problems is a motivation to focus on developing semi-supervised and unsupervised algorithms that combine deep learning and InSAR.

在这里插入图片描述
Fig. 9: Randomly selected patches obtained from the testing phase of the network for SAR-optical image patch correspondence detection proposed in [50].

F. SAR-Optical Data fusion The fusion of SAR and optical images can provide complementary information about targets. However, considering the two different sensing modalities, prior identiﬁcation and coregistration of corresponding images are challenging [164], but compulsory for joint applications of SAR and optical images. For the purpose of identifying and matching SAR and optical images, many current methods resort to deep learning, given its powerful capabilities of extracting effective features from complex images. In [50], the authors proposed a CNN for identifying corresponding image patches of very high resolution (VHR) optical and SAR imagery of complex urban scenes. Their network consists of two streams: one designed for extracting features from optical images, the other responsible for learning features from SAR images. Next the extracted features are fused via a concatenation layer for further binary prediction of their correspondence. A selection of True Positives, False Positives, False Negatives, and True Negatives of SAR-optical image patches from [50] can be seen in Fig. 9. Similarly, Hughes et al. [11] proposed a pseudo-Siamese CNN for learning a multi-sensor correspondence predictor for SAR and optical image patches. Notably, both networks in [50], [11] are trained and validated on the SARptical dataset [165], [166], which is speciﬁcally built for joint analysis of VHR SAR and optical images in dense urban areas. In [167], the authors proposed a deep learning framework that can learn an end-to-end mapping between image patch pairs and their matching labels. An image pair is ﬁrst transformed into two 1-D vectors and then concatenated to build a large 1-D vector as the input of the network. Then hidden layers are stacked for learning the mapping between input vectors and output binary labels, which indicate their correspondence.

在这里插入图片描述

For the purpose of matching SAR and optical images, Merkle et al. [168] presented a CNN that comprises of a feature extraction stage (Siamese network) and a similarity measure stage (dot product layer). Speciﬁcally, features of input optical and SAR images are extracted via two separate 9-layer branches and then fed to a dot product layer for predicting the shift of the optical image within the large SAR reference patch. Experimental results indicate that this deep learning-based method outperforms state-of-the-art matching approaches [169], [170]. Furthermore, Abulkhanov et al. [171] successfully trained a neural network to build feature point descriptors to identify corresponding patches among SAR and optical images and match the detected descriptors using the RANSAC algorithm [172].

在这里插入图片描述

In contrast to training a model to identify corresponding image patches, Merkle et al. [173] ﬁrst employed a conditional generative adversarial network (cGAN) to generate artiﬁcial SAR-like images from optical images, then matched them with real SAR images. The authors demonstrate that the matching accuracy and precision are both improved with the proposed strategy. Inspired by their study, more researchers resorted to using GANs for the purpose of SAR-optical image matching (see [174], [175] for a review). With respect to applications of SAR and optical image matching, Yao et al. [176] aimed at applying SAR and optical images to semantic segmentation with deep neural networks. They collected corresponding optical patches from Google EarthaccordingtoTerraSAR-Xpatchesandbuiltgroundtruths usingdatafromOpenStreetMap.ThenSARandopticalimages were separately fed to different CNNs to predict semantic labels (building, natural, land use, and water). Despite their experimental results not outperforming the state of the art by the time [177] likely because of network design or training strategy, they deduced that introducing advanced models and simultaneouslyusingbothdatasourcescangreatlyimprovethe performance of semantic segmentation. Another application mentioned in [178] demonstrated that standard fusion techniques for SAR and optical images require data from both sources,whichindicatesthatitisstillnoteasytointerpretSAR images without the support of optical images. To address this issue, Schmitt et al. [178] proposed an automatic colorization network, composed of a VAE and a mixture density network (MDN) [179], to predict artiﬁcially colored SAR images (i.e., Sentinel-1 images). These images are proven to disclose more information to the human interpreter than the original SAR data. In [180], the authors tackled the problem of cloud removal from optical imagery. They introduced a cGAN architecture to fuse SAR and cloud-corrupted multi-spectral data for generating cloud- and haze-free multi-spectral optical data. Experiments proved the effectiveness of the proposed network for removing cloud from multi-spectral data with auxiliary SAR data. Extending previous multi-modal networks for cloud removal, [181] proposed a cycle-consistent GAN architecture [182] that utilizes a image forward-backward translation consistency loss. Cloud-covered optical information is reconstructed via SAR data fusion, while changes to cloud-free areas are minimized through use of the cycle consistency loss. Thecycle-consistentarchitectureallowstrainingwithoutpixelwise correspondences between cloudy input and cloud-free target optical imagery, relaxing requirements on the training data set. In summary, it can be seen that the utilization of deep learning methods for SAR-optical data fusion has been a hot topic in the remote sensing community. Although a handful of data sets consisting of optical and SAR corresponding image patches are available for different terrain types and applications, one of the biggest problems in this task is still the scarcity of high quality training data. Semi-supervised methods, as proposed in [183], seems to be a viable option to tackle the problem. A great challenge in SAR-optical image matching is the extreme difference in viewing geometries of the two sensors. For this it is important to exploit auxiliary 3D data in order to assist the training data generation.

V. EXISTING BENCHMARK DATASETS AND THEIR LIMITATIONS In order to train and evaluate deep learning models, large datasets are indispensable. Unlike RGB images in the computer vision community, which can be easily collected and interpreted, SAR images are much more difﬁcult to annotate due to their complex properties. Our research shows that big SAR datasets created for the primary purpose of deep learning research are nearly non-existent in the community. In recent years, only a few SAR datasets have been made public for training and assessing deep learning models. In the following, we categorize those datasets according to their best suited deep learning problem and focus on openly accessible and wellcurated large datasets. In particular, we consider the following categories of deep learning problems in SAR. • Image classiﬁcation: each pixel or patch in one image is classiﬁed into a single label. This is often the case in typical land use land cover classiﬁcation problems. • Scene classiﬁcation: similar to image classiﬁcation, one image or patch is classiﬁed into a single label. However, one scene is usually much larger than an image patch. Hence, it requires a different network architecture. • Semantic segmentation: one image or patch is segmented to a classiﬁcation map of the same dimension. Training
of such neural networks also requires densely annotated training data. • Object detection: similar to scene classiﬁcation. However, detection often requires the estimation of the object location. • Registration/matching: provide binary classiﬁcation (matched or unmatched), or estimate the translation between two image patches. This type of task requires matching pairs of two different image patches as training data.
A. Image/Scene Classiﬁcation • So2Sat LCZ42 [185]: So2Sat LCZ42 follows the local climate zones (LCZs) classiﬁcation scheme. The dataset comprises 400,673 pairs of dual-pol Sentinel-1 and multispectral Sentinel-2 image patches from 42 urban agglomerations, plus 10 additional smaller areas, across ﬁve continents. The image patches are hand-labelled into one of the 17 LCZ classes [198]. The Sentinel-1 image patches in this dataset contain both the geocoded single look complex image, as well as a despeckled Lee ﬁltered variant. In particular, it is the ﬁrst Earth observation dataset that provides a quantitative measure of the label uncertainty, achieved by letting a group of domain experts cast 10 independent votes on 19 cities in the dataset. The dataset therefore can be considered a large-scale data fusion and classiﬁcation benchmark dataset for cutting-edge machine learning methodological developments, such as automatic topology learning, data fusion, and quantiﬁcation of uncertainties. • OpenSARUrban [184]: OpenSARUrban consists of 33,358 patches of Sentinel-1 dual-pol images covering 21 majorcitiesinChina.Thedatasetwasmanuallyannotated according to a hierarchical classiﬁcation scheme, with 10 classes of urban scenes at its ﬁnest level. Each image patch has a dimension of 100 by 100 pixels with a pixel spacing of 10 m (Sentinel-1 GRD product). This dataset can support deep learning studies of urban target characterization, and content-based SAR image queries. Fig. 10 shows some samples from the OpenSARUrban dataset.
B. Semantic Segmentation/Classiﬁcation • SEN12MS [187]: SEN12MS was created based on its previous version SEN12 [188]. SEN12MS consists of 180,662 triplets of dual-pol Sentinel-1 image patches, multi-spectral Sentinel-2 image patches,and MODISland cover maps. The patches are georeferenced with a ground sampling distance of 10 m. Each image patch has a dimension of 256 by 256 pixels. We expect this dataset to support the community in developing sophisticated deep learning-based approaches for common tasks such as scene classiﬁcation or semantic segmentation for land cover mapping. • MSAW [189]: The multi-sensor all-weather mapping (MSAW) dataset includes high-resolution SAR data, which covers 120 km2 in the area of Rotterdam, the Netherlands. The quad-polarized X-band SAR imagery from Capella Space with 0.5 m spatial resolution was used for the SpaceNet 6 Challenge. A total of 48,000 unique building footprints have been labeled with additional building heights. • PolSF [190]: This dataset consists of PolSAR images of San Francisco from eight different sensors, including AIRSAR, ALOS-1, ALOS-2, RADARSAT2, SENTINEL-1A, SENTINEL-1B, GAOFEN-3, and RISAT (data compiled by E. Pottier of IETR). Five of the eight images were densely labeled to ﬁve or six land use land cover classes in [190]. These densely annotated images correspond to roughly 3,000 training patches of 128 by 128 pixels. Although the data volume is relatively low for deep learning research, this dataset is the only annotated multi-sensory PolSAR dataset, to the best of our knowledge. Therefore, we suggest that the creator of this dataset increase the number of annotated images to enable greater potential use of this dataset.
C. Object Detection • MSTAR [192]: The Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset is one of
the earliest datasets for SAR target recognition. The dataset consists of total 17,658 X-band SAR image chips (patches) of 10 classes of vehicle plus one class of simplegeometricshapedtarget.ThecollectedSARimage patches are 128 by 128 pixels with a resolution of one foot in range and azimuth. In addition, 100 SAR images of clutter were also provided in the dataset. In our opinion, the number of image patches in this dataset is relatively low for deep learning models, especially considering the number of classes. In addition, this dataset represents a rather ideal and unrealistic scenario: vehicles in the dataset are centered in the patch, and the clutter is quite homogeneous without disturbing signals. However, considering the scarcity of such datasets, MSTAR is a valuable source for target recognition. • OpenSARShip 2.0 [194]: This dataset was built based on its previous version, OpenSARShip [195]. It contains 34,528 Sentinel-1 SAR image patches of different ships with automatic identiﬁcation system (AIS) information. For each SAR image patch, the creators manually extracted the ship length, width, and direction, as well as its type by verifying this data on the Marine Trafﬁc website [194]. Among all the patches, about one-third is extracted from Sentinel-1 GRD products, and the other two-thirds are from Sentinel-1 SLC products. OpenSARShip 2.0 is one of the handful of SAR datasets suitable for object detection. • SAR-Ship-Dataset [196]: This dataset was created using 102 Gaofen-3 and 108 Sentinel-1 images. It consists of 43,819 ship chips of 256 pixels in both range and azimuth. These ships mainly have distinct scales and backgrounds. Therefore, this dataset can be employed for developing multi-scale object detection models. • FUSAR-Ship [199]: This dataset was created using space-time matched-up datasets of Gaofen-3 SAR images andshipAISmessages.Itconsistsofover5000shipchips with corresponding ship information extracted from AIS messages, which can be used to trace back to each unique ship of any particular chip.
D. Registration/Matching • SARptical [197], [166]: The SARptical dataset was designed for interpreting VHR spaceborne SAR images of dense urban areas. This dataset consists of 10,108 pairs of corresponding very high resolution SAR and optical image patches, whose location is precisely coregistered in 3D. They are extracted from TerraSAR-X VHR spotlight images with resolution better than 1 m and UltraCAM aerial optical images of 20 cm pixel spacing, respectively. Unlike low and medium resolution images, high resolution SAR and optical images in dense urban areas haveverydistinctgeometries.Therefore,intheSARptical dataset, the center points of each image pair are matched in 3D space via sophisticated 3D reconstruction and matching algorithms. The UTM coordinates of the center pixel of each pair are also made available publicly in the dataset. This dataset contributes to applications of multimodal data classiﬁcation, and SAR optical images coregistering. However, we believe more training samples are required for learning complicated SAR optical image to image mapping. • SEN1-2 [188]: The SEN1-2 dataset consists of 282,384 pairs of corresponding Sentinel-1 single polarization intensity and Sentinel-2 RGB image patches, collected from across the globe and throughout all meteorological seasons. The patches are of dimension 256 by 256 pixels. Their distribution over the four seasons is roughly even. SEN1-2 is the ﬁrst large open dataset of this kind. We believe it will support further developments in the ﬁeld of deep learning for remote sensing as well as multi-sensor data fusion, such as SAR image colorization, and SARoptical image matching.
E. Other Datasets • Sample PolSAR images from ESA: https://earth.esa. int/web/polsarpro/data-sources/sample-datasets. For example, the Flevoland PolSAR Dataset. Several works make use of this dataset for agricultural land use land cover classiﬁcation. The authors of [200], [201], [202] have manually labeled the dataset according to different classiﬁcation schemes. • SAR Image Land Cover Datasets [203]: This dataset is not publicly available. Please contact the creator. • Airbus Ship Detection Challenge: https://www.kaggle. com/c/airbus-ship-detection.

VI. CONCLUSION AND FUTURE TRENDS This paper reviews the current state-of-the-art of an important and under-exploited research ﬁeld — deep learning in SAR. Relevant deep learning models are introduced, and their applications in six application ﬁelds — terrain surface classiﬁcation, object detection, parameter inversion, despeckling, InSAR, and SAR-optical data fusion — are analyzed in depth. Exisiting benchmark datasets and their limitations are discussed. In summary, despite early successes, full exploitation of deep learning in SAR ismostly limited by 1) the lack of large and representative benchmark datasets and 2) the defect of tailored deep learning models that make full consideration of SAR signal characteristics. Looking forward, the years ahead will be exciting. Next generation spaceborne SAR missions will simultaneously provide high resolution and global coverage, which will enable novel applications such as monitoring the dynamic Earth. To retrieve geo-parameters from these data, development of new analytics methods are warranted. Deep learning is among the most promising methods. To fully unlock its potential in SAR/InSAR applications in this big SAR data era, there are several promising future directions: • Large and Representative Benchmark Datasets: As summarized in this article, there is only a handful of SAR benchmarks, in particular when excluding multi-modal ones. For instance, in SAR target detection, methods are mainly tested on a single benchmark data set — the MSTAR dataset, where only several thousands of target samples in total (several hundreds for each class) are provided for training. With respect to InSAR, due to the lack of ground truth, datasets are extremely deﬁcient or nearly nonexistent. Large and representative expertannotated benchmark datasets are in high demand in the SAR community, and deserve more attention.
• Unsupervised Deep Learning: To bypass the deﬁcienciesinannotateddatainSAR,unsuperviseddeeplearning is a promising direction. These algorithms derive insights directly from the data itself, and work as feature learning, representation learning, or clustering, which could be further used for data-driven analytics. Autoencoders and their extensions, such as variational autoencoders (VAEs) and deep embedded clustering algorithms, are popular choices. With respect to denoising, in despeckling, the high complexity of SAR images and lack of ground truth make it infeasible to produce appropriate benchmarks from real data. Noise2Noise [204] is an elegant example of unsupervised denoising where the authors learn denoised data without clean data. Despite the nice visual appearance of the results, preserving details is a must for SAR applications. • Interferometric Data Processing: Since deep learning methods are initially applied to perception tasks in computer vision, many methods resort to transforming SAR images, e.g., PolSAR images, into RGB-like images in advance or focus only on intensities. In other words, the most essential component of a SAR measurement — the phase information — is not appropriately considered. Although CV-CNNs are capable of learning phase information and show great potential in processing CV-SAR images, only a few such attempts have been made [76]. Extending CNN to complex domain, while being able to preserve the precious phase information, would enable networks to directly learn features from raw data, and would open up a wide range of SAR/InSAR applications. • Quantiﬁcation of Uncertainties: Generally speaking, geo-parameter estimates without uncertainty measures are considered invalid in remote sensing. Appropriately trained deep learning models can achieve highly accurate predictions. Yet, they fail in quantifying the uncertainty of these predictions. Here, giving a statement about the predictive uncertainty, while considering both aleatoric uncertainty and epistemic uncertainty, is of crucial importance. The Bayesian deep learning community has developed a model-agnostic and easy-to-implement methodology to estimate both data and model uncertainty within deep learning models [205], which are awaiting exploration by the SAR community. • Large Scale Nonlinear Optimization Problems: The development of inversion algorithms should keep up the pace of data growth. Fast solvers are demanded for many advanced parameter inversion models, which often involve non-convex, nonlinear, and complex-valued optimization problems, such as compressive-sensing-based tomographic inversion, or low rank complex tensor decomposition for InSAR time series data analysis. In some cases, the iterations of the optimization algorithms perform similar computations as layers in neural networks, that is, a linear step followed by a non-linear activation (see for example, the iteratively reweighted least-squares approach). And it is thus meaningful to replace the computationally expensive optimization algorithms with unrolled deep architectures that could be trained from simulated data [206]. • Cognitive Sensors: Radars –– and SARs in particular – – are very complex and versatile imaging machines. A variety of modes (stripmap, spotlight, ScanSAR, TOPS, etc.), swath-widths, incidence angles and polarizations can be programmed in near real-time. Cognitive radars go a giant step further; they adapt their operational modes autonomously to the environment to be imaged by an intelligent interplay of transmit waveforms, adaptive signal processing on the receiver side and learning. Cognitive SARs are still in their conceptual and experimental phase and are often justiﬁed by the stunning capabilities of the echo-location system of bats. In his early pioneering article [207] Haykin deﬁnes three ingredients of a cognitive radar: “1) intelligent signal processing, which builds on learning through interactions of the radar with the surrounding environment; 2) feedback from the receiver to the transmitter, which is a facilitator of intelligence; and 3) preservation of the information content of radar returns, which is realized by the Bayesian approach to target detection through tracking.” Such a SAR could, e.g., perform a low resolution, yet wide swath, surveillance of a coastal area and in a ﬁrst step detect objects of interest, like ships, in real-time. Based on these detections the transmit waveform can be modiﬁed such as to zoom into the region of interest and allow for a close-up look of the object and possibly classify or even identify it. Reinforcement (online) learning is part of the concept as well as fast and reliable detectors or classiﬁers (trained ofﬂine), e.g. based on deep learning. All this is edge computing; the learning algorithms have to perform in real-time and with the limited compute resources onboard the satellite or airplane. Last but not least, technology advances in deep learning in remote sensing would only be possible if experts in remote sensing and machine learning work closely together. This is particularly true when it comes to SAR. Thus, we encourage more joint initiatives working collaboratively toward deep learning powered, explainable and reproducible big SAR data analytics.
REFERENCES [1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. [2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, 2014. [3] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212–3232, 2019. [4] Y. Guo, Y. Liu, T. Georgiou, and M. S. Lew, “A review of semantic segmentation using deep neural networks,” International Journal of Multimedia Information Retrieval, vol. 7, no. 2, pp. 87–93, 2018. [5] X. X. Zhu, D. Tuia, L. Mou, G. Xia, L. Zhang, F. Xu, and F. Fraundorfer, “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 4, pp. 8–36, 2017. [6] H. Parikh, S. Patel, and V. Patel, “Classiﬁcation of SAR and PolSAR images using deep learning: a review,” International Journal of Image and Data Fusion, vol. 11, no. 1, pp. 1–32, 2020. [7] S.ChenandH.Wang,“SARtargetrecognitionbasedondeeplearning,” in International Conference on Data Science and Advanced Analytics (DSAA), 2014.
[8] L. Wang, A. Scott, L. Xu, and D. Clausi, “Ice concentration estimation from dual-polarized SAR images using deep convolutional neural networks,” IEEE Transactions on Geoscience and Remote Sensing, 2014. [9] P. Wang, H. Zhang, and V. Patel, “SAR image despeckling using a convolutional neural network,” IEEE Signal Processing Letters, vol. 24, no. 12, pp. 1763–1767, 2017. [10] N. Anantrasirichai, J. Biggs, F. Albino, P. Hill, and D. Bull, “Application of machine learning to classiﬁcation of volcanic deformation in routinely generated InSAR data,” Journal of Geophysical Research: Solid Earth, 2018. [11] L. Hughes, M. Schmitt, L. Mou, Y. Wang, and X. X. Zhu, “Identifying corresponding patches in SAR and optical images with a pseudosiamese CNN,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 784–788, 2018. [12] K. Ikeuchi, T. Shakunaga, M. Wheeler, and T. Yamazaki, “Invariant histograms and deformable template matching for SAR target recognition,” in Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 1996, pp. 100–105. [13] Q. Zhao and J. Principe, “Support vector machines for SAR automatic target recognition,” IEEE Transactions on Aerospace and Electronic Systems, vol. 37, no. 2, pp. 643–654, 2001. [14] M. Bryant and F. Garber, “SVM classiﬁer applied to the MSTAR public data set,” in Algorithms for Synthetic Aperture Radar Imagery, 1999. [15] M. Ferguson, R. Ak, Y.-T. T. Lee, and K. H. Law, “Automatic localization of casting defects with convolutional neural networks,” in 2017 IEEE International Conference on Big Data (Big Data). Boston, MA: IEEE, Dec. 2017, pp. 1726–1735. [Online]. Available: http://ieeexplore.ieee.org/document/8258115/ [16] C. Bourez, “Deep learning course,” [Accessed May 27, 2020]. [Online]. Available: http://christopher5106.github.io/img/ deeplearningcourse/DL46.png [17] S. Panchal, “Cityscape image segmentation with tensorﬂow 2.0,” [Accessed May 27, 2020]. [Online]. Available: https://miro.medium. com/max/2000/1*3FGS0kEAS55XmqxIXkp0mQ.png [18] Wikipedia, “Long short-term memory,” [Accessed May 27, 2020]. [Online]. Available: https://upload.wikimedia.org/wikipedia/commons/ thumb/3/3b/The LSTM cell.png/1280px-The LSTM cell.png [19] W. Feng, N. Guan, Y. Li, X. Zhang, and Z. Luo, “Audio visual speech recognition with multimodal recurrent neural networks,” in 2017 International Joint Conference on Neural Networks (IJCNN). Anchorage, AK, USA: IEEE, May 2017, pp. 681–688. [Online]. Available: http://ieeexplore.ieee.org/document/7965918/ [20] “Under the hood of the variational autoencoder (in prose and code),” [Accessed May 27, 2020]. [Online]. Available: http: //fastforwardlabs.github.io/blog-images/miriam/imgs code/vae.4.png [21] T. Silva, “An intuitive introduction to generative adversarial networks (gans),” [Accessed May 26, 2020]. [Online]. Available: https://cdn-media-1.freecodecamp.org/images/ m41LtQVUf3uk5IOYlHLpPazxI3pWDwG8VEvU [22] M. Zitnik, M. Agrawal, and J. Leskovec, “Modeling polypharmacy side effects with graph convolutional networks,” Bioinformatics, vol. 34, no. 13, p. 457–466, 2018. [23] B. Huang and K. M. Carley, “Residual or Gate? Towards Deeper Graph Neural Networks for Inductive Graph Representation Learning,” arXiv:1904.08035 [cs, stat], Aug. 2019, arXiv: 1904.08035. [Online]. Available: http://arxiv.org/abs/1904.08035 [24] B. Zoph and Q. V. Le, “Neural Architecture Search with Reinforcement Learning,” arXiv:1611.01578 [cs], Feb. 2017, arXiv: 1611.01578. [Online]. Available: http://arxiv.org/abs/1611.01578 [25] Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten digit database,” IEEE, 2010. [26] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [27] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012. [28] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255. [29] T. Tieleman and G. Hinton, “Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning, 2012. [30] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [32] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241. [33] G. Huang, Z. Liu, K. Weinberger, and L. Maaten, “Densely connected convolutional networks,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [34] T. Hoeser and C. Kuenzer, “Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part I: Evolution and Recent Trends,” Remote Sensing, vol. 12, no. 10, p. 1667, 2020. [35] B. A. Pearlmutter, “Learning state space trajectories in recurrent neural networks,” Neural Computation, vol. 1, no. 2, pp. 263–269, 1989. [36] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [37] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672– 2680. [38] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overﬁtting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014. [39] K. Pearson, “Liii. on lines and planes of closest ﬁt to systems of points in space,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 2, no. 11, pp. 559–572, 1901. [40] D.P.KingmaandM.Welling,“Auto-encodingvariationalbayes,”arXiv preprint arXiv:1312.6114, 2013. [41] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. [42] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource management with deep reinforcement learning,” in Proceedings of the 15th ACM Workshop on Hot Topics in Networks, 2016, pp. 50–56. [43] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, p. 484, 2016. [44] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv preprint arXiv:1611.01578, 2016. [45] T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” arXiv preprint arXiv:1808.05377, 2018. [46] T. N. Kipf and M. Welling, “Semi-supervised classiﬁcation with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016. [47] B. Huang and K. M. Carley, “Residual or gate? towards deeper graph neural networks for inductive graph representation learning,” arXiv preprint arXiv, 2019. [48] Y. Shi, Q. Li, and X. X. Zhu, “Building segmentation through a gated graph convolutional neural network with deep structured feature embedding,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 159, pp. 184–197, 2020. [49] X. Tang, L. Zhang, and X. Ding, “SAR image despeckling with a multilayer perceptron neural network,” International Journal of Digital Earth, pp. 1–21, 2018. [50] L. Mou, M. Schmitt, Y. Wang, and X. X. Zhu, “A CNN for the identiﬁcation of corresponding patches in SAR and optical imagery of urban scenes,” in Urban Remote Sensing Event (JURSE), 2017. [51] R. Touzi, A. Lopes, and P. Bousquet, “A statistical and geometrical edge detector for SAR images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 26, no. 6, pp. 764–773, 1988. [52] G. Chierchia, D. Cozzolino, G. Poggi, and L. Verdoliva, “SAR image despeckling through convolutional neural networks,” arXiv:1704.00275, 2017. [53] Y. Shi, X. X. Zhu, and R. Bamler, “Optimized parallelization of nonlocal means ﬁlter for image noise reduction of InSAR image,” in IEEE International Conference on Information and Automation, 2015. [54] X. X. Zhu, R. Bamler, M. Lachaise, F. Adam, Y. Shi, and M. Eineder, “Improving TanDEM-X DEMs by non-local InSAR ﬁltering,” in European Conference on Synthetic Aperture Radar (EUSAR), 2014. [55] L. Denis, C.-A. Deledalle, and F. Tupin, “From patches to deep learning: Combining self-similarity and neural networks for sar image despeckling,” in IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2019, pp. 5113–5116.
[56] A. Moreira, P. Prats-Iraola, M. Younis, G. Krieger, I. Hajnsek, and K. P. Papathanassiou, “A tutorial on synthetic aperture radar,” IEEE Geoscience and Remote Sensing Magazine, vol. 1, no. 1, pp. 6–43, 2013. [57] C. He, S. Li, Z. Liao, and M. Liao, “Texture classiﬁcation of PolSAR data based on sparse coding of wavelet polarization textons,” IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 8, pp. 4576–4590, 2013. [58] H. Xie, S. Wang, K. Liu, S. Lin, and B. Hou, “Multilayer feature learning for polarimetric synthetic radar data classiﬁcation,” in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2014. [59] J. Geng, H. Wang, J. Fan, and X. Ma, “Deep supervised and contractive neural network for SAR image classiﬁcation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 4, pp. 2442–2459, 2017. [60] S. Uhlmann and S. Kiranyaz, “Integrating color features in polarimetric SAR image classiﬁcation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 4, pp. 2197–2216, 2014. [61] J. Geng, J. Fan, H. Wang, X. Ma, B. Li, and F. Chen, “High-resolution SAR image classiﬁcation via deep convolutional autoencoders,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 11, pp. 2351– 2355, 2015. [62] B. Hou, B. Ren, G. Ju, H. Li, L. Jiao, and J. Zhao, “SAR image classiﬁcation via hierarchical sparse representation and multisize patch features,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 1, pp. 33–37, 2016. [63] F. Gao, T. Huang, J. Wang, J. Sun, A. Hussain, and E. Yang, “Dualbranch deep convolution neural network for polarimetric SAR image classiﬁcation,” Applied Sciences, vol. 7, no. 5, p. 447, 2017. [64] B. Hou, H. Kou, and L. Jiao, “Classiﬁcation of polarimetric SAR images using multilayer autoencoders and superpixels,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 9, no. 7, pp. 3072–3081, 2016. [65] L. Zhang, W. Ma, and D. Zhang, “Stacked sparse autoencoder in PolSAR data classiﬁcation using local spatial information,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 9, pp. 1359–1363, 2016. [66] F. Qin, J. Guo, and W. Sun, “Object-oriented ensemble classiﬁcation for polarimetric SAR imagery using restricted Boltzmann machines,” Remote Sensing Letters, vol. 8, no. 3, pp. 204–213, 2017. [67] Z. Zhao, L. Jiao, J. Zhao, J. Gu, and J. Zhao, “Discriminant deep belief network for high-resolution SAR image classiﬁcation,” Pattern Recognition, vol. 61, pp. 686–701, 2017. [68] Y. Zhou, H. Wang, F. Xu, and Y. Jin, “Polarimetric SAR image classiﬁcation using deep convolutional neural networks,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 12, pp. 1935–1939, 2016. [69] Y. Wang, C. He, X. Liu, and M. Liao, “A hierarchical fully convolutional network integrated with sparse and low-rank subspace representations for PolSAR imagery classiﬁcation,” Remote Sensing, vol. 10, no. 2, p. 342, 2018. [70] S. Chen and C. Tao, “PolSAR image classiﬁcation using polarimetricfeature-driven deep convolutional neural network,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 4, pp. 627–631, 2018. [71] C. He, M. Tu, D. Xiong, and M. Liao, “Nonlinear manifold learning integrated with fully convolutional networks for PolSAR image classiﬁcation,” Remote Sensing, vol. 12, no. 4, p. 655, 2020. [72] H. Dong, L. Zhang, and B. Zou, “PolSAR image classiﬁcation with lightweight 3d convolutional networks,” Remote Sensing, vol. 12, no. 3, p. 396, 2020. [73] E. Ndikumana, D. Ho Tong Minh, N. Baghdadi, D. Courault, and L. Hossard, “Deep recurrent neural network for agricultural classiﬁcation using multitemporal SAR sentinel-1 for camargue, france,” Remote Sensing, vol. 10, no. 8, p. 1217, 2018. [74] N. Teimouri, M. Dyrmann, and R. N. Jørgensen, “A novel spatiotemporal FCN-LSTM network for recognizing various crop types using multi-temporal radar images,” Remote Sensing, vol. 11, no. 8, p. 990, 2019. [75] H. Dong, B. Zou, L. Zhang, and S. Zhang, “Automatic design of CNNs via differentiable neural architecture search for PolSAR image classiﬁcation,” IEEE Transactions on Geoscience and Remote Sensing, pp. 1–14, 2020. [76] Z. Zhang, H. Wang, F. Xu, and Y. Jin, “Complex-valued convolutional neural network and its application in polarimetric SAR image classiﬁcation,”IEEETransactionsonGeoscienceandRemoteSensing,vol.55, no. 12, pp. 7177–7188, 2017. [77] A. G. Mullissa, C. Persello, and A. Stein, “PolSARNet: A deep fully convolutional network for polarimetric SAR image classiﬁcation,”
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 12, pp. 5300–5309, 2019. [78] L. Li, L. Ma, L. Jiao, F. Liu, Q. Sun, and J. Zhao, “Complex contourletCNN for polarimetric SAR image classiﬁcation,” Pattern Recognition, p. 107110, 2019. [79] W. Xie, G. Ma, F. Zhao, H. Liu, and L. Zhang, “PolSAR image classiﬁcation via a novel semi-supervised recurrent complex-valued convolution neural network,” Neurocomputing, vol. 388, pp. 255–268, 2020. [80] R. Ressel, A. Frost, and S. Lehner, “A neural network-based classiﬁcation for sea ice types on x-band SAR images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 7, pp. 3672–3680, 2015. [81] R. Ressel, S. Singha, and S. Lehner, “Neural network based automatic sea ice classiﬁcation for CL-pol RISAT-1 imagery,” in 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE, 2016, pp. 4835–4838. [82] R. Ressel, S. Singha, S. Lehner, A. Rosel, and G. Spreen, “Investigation into different polarimetric features for sea ice classiﬁcation using xband synthetic aperture radar,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 9, no. 7, pp. 3131–3143, 2016. [83] S. Singha, M. Johansson, N. Hughes, S. M. Hvidegaard, and H. Skourup, “Arctic sea ice characterization using spaceborne fully polarimetric l-, c-, and x-band SAR with validation by airborne measurements,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 7, pp. 3715–3734, 2018. [84] N. Zakhvatkina, V. Smirnov, and I. Bychkova, “Satellite SAR databased sea ice classiﬁcation: An overview,” Geosciences, vol. 9, no. 4, p. 152, 2019. [85] A. Mazza, F. Sica, P. Rizzoli, and G. Scarpa, “TanDEM-X Forest Mapping Using Convolutional Neural Networks,” Remote Sensing, vol. 11, no. 24, p. 2980, Jan. 2019. [Online]. Available: https://www.mdpi.com/2072-4292/11/24/2980 [86] F. Zhang, C. Hu, Q. Yin, W. Li, H. Li, and W. Hong, “SAR target recognition using the multi-aspect-aware bidirectional LSTM recurrent neural networks,” arXiv:1707.09875, 2017. [87] E. Keydel, S. Lee, and J. Moore, “MSTAR extended operating conditions: A tutorial,” in Algorithms for Synthetic Aperture Radar Imagery III, 1996. [88] S. Chen, H. Wang, F. Xu, and Y. Jin, “Target classiﬁcation using the deep convolutional networks for SAR images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4806–4817, 2016. [89] D. Morgan, “Deep convolutional neural networks for ATR from SAR imagery,” in Algorithms for Synthetic Aperture Radar Imagery, 2015. [90] J.Ding,B.Chen,H.Liu,andM.Huang,“Convolutionalneuralnetwork with data augmentation for SAR target recognition,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 3, pp. 364–368, 2016. [91] K. Du, Y. Deng, R. Wang, T. Zhao, and N. Li, “SAR ATR based on displacement-and rotation-insensitive CNN,” Remote Sensing Letters, vol. 7, no. 9, pp. 895–904, 2016. [92] M. Wilmanski, C. Kreucher, and J. Lauer, “Modern approaches in deep learning for SAR ATR,” in Algorithms for Synthetic Aperture Radar Imagery, 2016. [93] S. Wagner, “SAR ATR by a combination of convolutional neural network and support vector machines,” IEEE Transactions on Aerospace and Electronic Systems, vol. 52, no. 6, pp. 2861–2872, 2016. [94] F. Gao, T. Huang, J. Sun, J. Wang, A. Hussain, and E. Yang, “A new algorithm for sar image target recognition based on an improved deep convolutional neural network,” Cognitive Computation, vol. 11, no. 6, pp. 809–824, 2019. [95] F. Gao, T. Huang, J. Wang, J. Sun, E. Yang, and A. Hussain, “Combining deep convolutional neural network and svm to sar image target recognition,” in IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2017. [96] H. Furukawa, “Deep learning for end-to-end automatic target recognition from synthetic aperture radar imagery,” arXiv:1801.08558, 2018. [97] D. Cozzolino, G. Di Martino, G. Poggi, and L. Verdoliva, “A fully convolutional neural network for low-complexity single-stage ship detectioninSentinel-1SARimages,”in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2017. [98] C.Schwegmann,W.Kleynhans,B.Salmon,L.Mdakane,andR.Meyer, “Very deep learning for ship discrimination in synthetic aperture radar imagery,” in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2016.
[99] C. Bentes, A. Frost, D. Velotto, and B. Tings, “Ship-iceberg discrimination with convolutional neural networks in high resolution SAR images,” in European Conference on Synthetic Aperture Radar (EUSAR), 2016. [100] N. Ødegaard, A. Knapskog, C. Cochin, and J. Louvigne, “Classiﬁcation of ships using real and simulated data in a convolutional neural network,” in IEEE Radar Conference (RadarConf), 2016. [101] Y. Liu, M. Zhang, P. Xu, and Z. Guo, “SAR ship detection using sea-land segmentation-based convolutional neural network,” in International Workshop on Remote Sensing with Intelligent Processing (RSIP), 2017. [102] R. Girshick, “Fast R-CNN,” arXiv:1504.08083, 2015. [103] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards realtime object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017. [104] J. Li, C. Qu, and J. Shao, “Ship detection in SAR images based on an improved faster R-CNN,” in SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), 2017. [105] M. Kang, K. Ji, X. Leng, and Z. Lin, “Contextual region-based convolutional neural network with multilayer fusion for SAR ship detection,” Remote Sensing, vol. 9, no. 8, p. 860, 2017. [106] J. Jiao, Y. Zhang, H. Sun, X. Yang, X. Gao, W. Hong, K. Fu, and X. Sun, “A densely connected end-to-end neural network for multiscale and multiscene SAR ship detection,” IEEE Access, vol. 6, pp. 20881– 20892, 2018. [107] C. Dechesne, S. Lefevre, R. Vadaine, G. Hajduch, and R. Fablet, “Multi-task deep learning from sentinel-1 sar: ship detection, classiﬁcation and length estimation,” in Conference on Big Data from Space, 2019. [108] A. G. Mullissa, C. Persello, and A. Stein, “Polsarnet: A deep fully convolutional network for polarimetric sar image classiﬁcation,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019. [109] S. Kazemi, B. Yonel, and B. Yazici, “Deep learning for direct automatic target recognition from sar data,” in 2019 IEEE Radar Conference (RadarConf). IEEE, 2019, pp. 1–6. [110] M. Rostami, S. Kolouri, E. Eaton, and K. Kim, “Deep transfer learning for few-shot sar image classiﬁcation,” Remote Sensing, vol. 11, no. 11, p. 1374, 2019. [111] Z. Huang, Z. Pan, and B. Lei, “What, where, and how to transfer in sar target recognition based on deep cnns,” IEEE Transactions on Geoscience and Remote Sensing, 2019. [112] M. Shahzad, M. Maurer, F. Fraundorfer, Y. Wang, and X. X. Zhu, “Buildings detection in VHR SAR images using fully convolution neural networks,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 2, pp. 1100–1116, 2019. [113] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2015. [114] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr, “Conditional random ﬁelds as recurrent neural networks,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1529–1537. [115] F. RADAR and J. FALKINGHAM, “Global satellite observation requirements for ﬂoating ice.” [Online]. Available: https://www.wmo.int/pages/prog/sat/meetings/documents/ PSTG-4 Doc 08-04 GlobSatObsReq-FloatingIce.pdf [116] W. Dierking, “Sea ice monitoring by synthetic aperture radar,” Oceanography, vol. 26, no. 2, pp. 100–111, 2013. [117] L. Wang, K. Scott, L. Xu, and D. Clausi, “Sea ice concentration estimation during melt from dual-pol SAR scenes using deep convolutional neural networks: A case study,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4524–4533, 2016. [118] L. Wang, “Learning to estimate sea ice concentration from SAR imagery,” Ph.D. dissertation, University of Waterloo, 2016. [Online]. Available: http://hdl.handle.net/10012/10954 [119] S. Parrilli, M. Poderico, C. V. Angelino, and L. Verdoliva, “A nonlocal SAR image denoising algorithm based on LLMMSE wavelet shrinkage,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 2, pp. 606–616, 2012. [120] D. Cozzolino, L. Verdoliva, G. Scarpa, and G. Poggi, “Nonlocal CNN SAR image despeckling,” Remote Sensing, vol. 12, no. 6, p. 1006, 2020. [121] T. Song, L. Kuang, L. Han, Y. Wang, and Q. H. Liu, “Inversion of rough surface parameters from SAR images using simulation-trained convolutional neural networks,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 7, pp. 1130–1134, 2018. [122] J. Zhao, M. Datcu, Z. Zhang, H. Xiong, and W. Yu, “Contrastiveregulated cnn in the complex domain: A method to learn physical scattering signatures from ﬂexible polsar images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 12, pp. 10116–10135, 2019. [123] J. Lee, “Digital image enhancement and noise ﬁltering by use of local statistics,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-2, no. 2, pp. 165–168, 1980. [124] D. Kuan, A. Sawchuk, T. Strand, and P. Chavel, “Adaptive noise smoothing ﬁlter for images with signal-dependent noise,” IEEE transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-7, no. 2, pp. 165–177, 1985. [125] V. Frost, J. Stiles, K. Shanmugan, and J. Holtzman, “A model for radar images and its application to adaptive digital ﬁltering of multiplicative noise,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-4, no. 2, pp. 157–166, 1982. [126] H. Xie, L. Pierce, and F. Ulaby, “SAR speckle reduction using wavelet denoising and Markov random ﬁeld modeling,” IEEE Transactions on Geoscience and Remote Sensing, vol. 40, no. 10, pp. 2196–2212, 2002. [127] F. Argenti and L. Alparone, “Speckle removal from SAR images in the undecimated wavelet domain,” IEEE Transactions on Geoscience and Remote Sensing, vol. 40, no. 11, pp. 2363–2374, 2002. [128] A. Achim, P. Tsakalides, and A. Bezerianos, “SAR image denoising via Bayesian wavelet shrinkage based on heavy-tailed modeling,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 8, pp. 1773–1784, 2003. [129] F. Argenti, A. Lapini, T. Bianchi, and L. Alparone, “A tutorial on speckle reduction in synthetic aperture radar images,” IEEE Geoscience and Remote Sensing Magazine, vol. 1, no. 3, pp. 6–35, 2013. [130] F. Tupin, L. Denis, C.-A. Deledalle, and G. Ferraioli, “Ten years of patch-based approaches for sar imaging: A review,” in IGARSS 2019 2019 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2019, pp. 5105–5108. [131] C.-A. Deledalle, L. Denis, and F. Tupin, “Iterative weighted maximum likelihood denoising with probabilistic patch-based weights,” IEEE Transactions on Image Processing, vol. 18, no. 12, pp. 2661–2672, 2009. [132] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2. IEEE, 2005, pp. 60–65. [133] Xin Su, C.-A. Deledalle, F. Tupin, and Hong Sun, “Two-step multitemporal nonlocal means for synthetic aperture radar images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 10, pp. 6181–6196, 2014. [134] C.-A. Deledalle, L. Denis, F. Tupin, A. Reigber, and M. Jager, “NL-SAR: A uniﬁed nonlocal framework for resolution-preserving (pol)(in)SAR denoising,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 4, pp. 2021–2038, 2015. [135] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2017. [136] Q. Zhang, Q. Yuan, J. Li, Z. Yang, and X. Ma, “Learning a dilated residual network for SAR image despeckling,” Remote Sensing, vol. 10, no. 2, p. 196, 2018. [137] D.-X. Yue, F. Xu, and Y.-Q. Jin, “Sar despeckling neural network with logarithmic convolutional product model,” International Journal of Remote Sensing, vol. 39, no. 21, pp. 7483–7505, 2018. [138] G. Baier, W. He, and N. Yokoya, “Robust nonlocal low-rank SAR time series despeckling considering speckle correlation by total variation regularization,” IEEE Transactions on Geoscience and Remote Sensing, pp. 1–13, 2020. [139] F. Lattari, B. Gonzalez Leon, F. Asaro, A. Rucci, C. Prati, and M. Matteucci, “Deep learning for SAR image despeckling,” Remote Sensing, vol. 11, no. 13, p. 1532, 2019. [140] H. Zebker, C. Werner, P. Rosen, and S. Hensley, “Accuracy of topographic maps derived from ERS-1 interferometric radar,” IEEE Transactions on Geoscience and Remote Sensing, vol. 32, no. 4, pp. 823–836, 1994. [141] R. Abdelfattah and J. Nicolas, “Topographic SAR interferometry formulation for high-precision DEM generation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 40, no. 11, pp. 2415–2426, 2002. [142] D. Massonnet, P. Briole, and A. Arnaud, “Deﬂation of mount Etna monitored by spaceborne radar interferometry,” Nature, vol. 375, no. 6532, p. 567, 1995. [143] J. Ruch, J. Anderssohn, T. Walter, and M. Motagh, “Caldera-scale inﬂation of the Lazufre volcanic area, south America: Evidence from InSAR,” Journal of Volcanology and Geothermal Research, vol. 174, no. 4, pp. 337–344, 2008. [144] E. Trasatti, F. Casu, C. Giunchi, S. Pepe, G. Solaro, S. Tagliaventi, P. Berardino, M. Manzo, A. Pepe, G. Ricciardi, E. Sansosti, P. Tizzani, G. Zeni, and R. Lanari, “The 2004–2006 uplift episode at Campi Flegrei caldera (Italy): Constraints from SBAS-DInSAR ENVISAT data and Bayesian source inference,” Geophysical Research Letters, vol. 35, no. 7, pp. 1–6, 2008. [145] D. Massonnet, M. Rossi, C. Carmona, F. Adragna, G. Peltzer, K. Feigl, and T. Rabaute, “The displacement ﬁeld of the landers earthquake mapped by radar interferometry,” Nature, vol. 364, no. 6433, p. 138, 1993. [146] G. Peltzer and P. Rosen, “Surface displacement of the 17 May 1993 Eureka valley, California, earthquake observed by SAR interferometry,” Science, vol. 268, no. 5215, pp. 1333–1336, 1995. [147] V. B. H. (Gini) Ketelaar, Satellite Radar Interferometry, ser. Remote Sensing and Digital Image Processing. Springer Netherlands, 2009, vol. 14. [148] X. X. Zhu and R. Bamler, “Let’s do the time warp: Multicomponent nonlinear motion estimation in differential SAR tomography,” IEEE Geoscience and Remote Sensing Letters, vol. 8, no. 4, pp. 735–739, 2011. [149] S. Gernhardt and R. Bamler, “Deformation monitoring of single buildings using meter-resolution SAR data in PSI,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 73, pp. 68–79, 2012. [150] S. Montazeri, X. X. Zhu, M. Eineder, and R. Bamler, “Threedimensional deformation monitoring of urban infrastructure by tomographic SAR using multitrack TerraSAR-x data stacks,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 12, pp. 6868–6878, 2016. [151] K. Ichikawa and A. Hirose, “Singular unit restoration in InSAR using complex-valued neural networks in the spectral domain,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 3, pp. 1717–1723, 2017. [152] R. Yamaki and A. Hirose, “Singular unit restoration in interferograms based on complex-valued markov random ﬁeld model for phase unwrapping,” IEEE Geoscience and Remote Sensing Letters, vol. 6, no. 1, pp. 18–22, 2009. [153] K. Oyama and A. Hirose, “Adaptive phase-singular-unit restoration with entire-spectrum-processing complex-valued neural networks in interferometric SAR,” Electronics Letters, vol. 54, no. 1, pp. 43–44, 2018. [154] S. Valade, A. Ley, F. Massimetti, O. D’Hondt, M. Laiolo, D. Coppola, D. Loibl, O. Hellwich, and T. R. Walter, “Towards global volcano monitoring using multisensor sentinel missions and artiﬁcial intelligence: The MOUNTS monitoring system,” Remote Sensing, vol. 11, no. 13, p. 1528, 2019. [155] G. Costante, T. Ciarfuglia, and F. Biondi, “Towards monocular digital elevation model (DEM) estimation by convolutional neural networksapplication on synthetic aperture radar images,” arXiv:1803.05387, 2018. [156] C. Schwegmann, W. Kleynhans, J. Engelbrecht, L. Mdakane, and R. Meyer, “Subsidence feature discrimination using deep convolutional neural networks in synthetic aperture radar imagery,” in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2017. [157] N. Anantrasirichai, F. Albino, P. Hill, D. Bull, and J. Biggs, “Detecting volcano deformation in InSAR using deep learning,” arXiv:1803.00380, 2018. [158] N. Anantrasirichai, J. Biggs, F. Albino, and D. Bull, “A deep learning approach to detecting volcano deformation from satellite imagery using synthetic datasets,” Remote Sensing of Environment, vol. 230, p. 111179, 2019. [159] ——, “The application of convolutional neural networks to detect slow, sustained deformation in InSAR time series,” Geophysical Research Letters, vol. 46, no. 21, pp. 11850–11858, 2019. [160] F. Del Frate, M. Picchiani, G. Schiavon, and S. Stramondo, “Neural networks and SAR interferometry for the characterization of seismic events,” in Proc. SPIE, C. Notarnicola, Ed., 2010, p. 78290J. [161] M. Picchiani, F. Del Frate, G. Schiavon, S. Stramondo, M. Chini, and C. Bignami, “Neural networks for automatic seismic source analysis from DInSAR data,” in Proc. SPIE, 2011, p. 81790K. [162] S. Stramondo, F. Del Frate, M. Picchiani, and G. Schiavon, “Seismic source quantitative parameters retrieval from InSAR data and neural networks,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 1, pp. 96–104, 2011. [163] A. Hirose, Complex-Valued Neural Networks, ser. Studies in Computational Intelligence. Springer Berlin Heidelberg, 2012, vol. 400. [164] M. Schmitt and X. X. Zhu, “On the challenges in stereogrammetric fusion of SAR and optical imagery for urban areas,” the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 41, no. B7, pp. 719–722, 2016. [165] Y. Wang, X. X. Zhu, S. Montazeri, J. Kang, L. Mou, and M. Schmitt, “Potential of the “SARptical” system,” in FRINGE, 2017. [166] Y. Wang and X. X. Zhu, “The SARptical dataset for joint analysis of SAR and optical image in dense urban area,” in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2018. [167] S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, “A deep learning framework for remote sensing image registration,” ISPRS Journal of Photogrammetry and Remote Sensing, 2018. [168] N. Merkle, W. Luo, S. Auer, R. M¨uller, and R. Urtasun, “Exploiting deep matching and SAR data for the geo-localization accuracy improvement of optical satellite images,” Remote Sensing, vol. 9, no. 6, p. 586, 2017. [169] S. Suri and P. Reinartz, “Mutual-information-based registration of TerraSAR-X and Ikonos imagery in urban areas,” IEEE Transactions on Geoscience and Remote Sensing, vol. 48, no. 2, pp. 939–949, 2010. [170] F. Dellinger, J. Delon, Y. Gousseau, J. Michel, and F. Tupin, “SARSIFT: A SIFT-like algorithm for SAR images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 1, pp. 453–466, 2015. [171] D. Abulkhanov, I. Konovalenko, D. Nikolaev, A. Savchik, E. Shvets, and D. Sidorchuk, “Neural network-based feature point descriptors for registration of optical and SAR images,” in International Conference on Machine Vision (ICMV), 2018. [172] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model ﬁtting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981. [173] N. Merkle, S. Auer, R. M¨uller, and P. Reinartz, “Exploring the potential of conditional adversarial networks for optical and SAR image matching,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, pp. 1–10, 2018. [174] L. H. Hughes, N. Merkle, T. Burgmann, S. Auer, and M. Schmitt, “Deep learning for SAR-optical image matching,” in IGARSS 2019 2019 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2019, pp. 4877–4880. [175] M. Fuentes Reyes, S. Auer, N. Merkle, C. Henry, and M. Schmitt, “SAR-to-optical image translation based on conditional generative adversarial networks—optimization, opportunities and limits,” Remote Sensing, vol. 11, no. 17, p. 2067, 2019. [176] W. Yao, D. Marmanis, and M. Datcu, “Semantic segmentation using deep neural networks for SAR and optical image pairs,” 2017. [177] N. Audebert, B. Le Saux, and S. Lefevre, “Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks,” in Computer Vision – ACCV 2016, S.-H. Lai, V. Lepetit, K. Nishino, and Y. Sato, Eds. Cham: Springer International Publishing, 2017, vol. 10111, pp. 180–196, series Title: Lecture Notes in Computer Science. [Online]. Available: http://link.springer.com/10.1007/978-3-319-54181-5 12 [178] M. Schmitt, L. Hughes, M. K¨orner, and X. X. Zhu, “Colorizing Sentinel-1 SAR images using a variational autoencoder conditioned on Sentinel-2 imagery,” International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 42, p. 2, 2018. [179] C. Bishop, “Mixture density networks,” Citeseer, Tech. Rep., 1994. [180] C. Grohnfeld, M. Schmitt, and X. X. Zhu, “A conditional generative adversarial network to fuse SAR and multispectral optical data for cloud removal from Sentinel-2 images,” in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2018. [181] P. Ebel, M. Schmitt, and X. Zhu, “Cloud removal in unpaired sentinel2 imagery using cycle-consistent gan and sar-optical data fusion,” IGARSS 2020 IEEE International Geoscience and Remote Sensing Symposium, 2020. [182] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired imageto-image translation using cycle-consistent adversarial networks,” in 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, Oct 2017, p. 2242–2251. [Online]. Available: http://ieeexplore.ieee.org/document/8237506/
[183] L. H. Hughes and M. Schmitt, “A SEMI-SUPERVISED APPROACH TO SAR-OPTICAL IMAGE MATCHING,” ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. IV-2/W7, pp. 71–78, 2019. [184] J. Zhao, Z. Zhang, W. Yao, M. Datcu, H. Xiong, and W. Yu, “OpenSARUrban: A Sentinel-1 SAR Image Dataset for Urban Interpretation,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 187–203, 2020. [Online]. Available: https://ieeexplore.ieee.org/document/8952866/ [185] X. Zhu, J. Hu, C. Qiu, Y. Shi, J. Kang, L. Mou, H. Bagheri, M. H¨aberle, Y. Hua, R. Huang, L. D. Hughes, H. Li, Y. Sun, G. Zhang, S. Han, M. Schmitt, and Y. Wang, “So2Sat LCZ42: A benchmark dataset for globallocalclimatezonesclassiﬁcation,” IEEE Geoscience and Remote Sensing Magazine, vol. in press, 2020. [186] M. Neumann, A. S. Pinto, X. Zhai, and N. Houlsby, “Indomain representation learning for remote sensing,” arXiv:1911.06721 [cs], Nov. 2019, arXiv: 1911.06721. [Online]. Available: http: //arxiv.org/abs/1911.06721 [187] M. Schmitt, L. H. Hughes, C. Qiu, and X. X. Zhu, “SEN12MS - A CURATED DATASET OF GEOREFERENCED MULTI-SPECTRAL SENTINEL-1/2 IMAGERY FOR DEEP LEARNING AND DATA FUSION,” ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. IV-2/W7, pp. 153–160, Sep. 2019. [Online]. Available: https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci. net/IV-2-W7/153/2019/ [188] M. Schmitt, L. H. Hughes, and X. X. Zhu, “The SEN1-2 dataset for deep learning in SAR-Optical data fusion,” in ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2018. [189] J. Shermeyer, D. Hogan, J. Brown, A. Van Etten, N. Weir, F. Paciﬁci, R. Haensch, A. Bastidas, S. Soenen, T. Bacastow et al., “Spacenet 6: Multi-sensor all weather mapping dataset,” arXiv preprint arXiv:2004.06500, 2020. [190] X. Liu, L. Jiao, and F. Liu, “PolSF: Polsar image dataset on san francisco,” arXiv preprint arXiv:1912.07259, 2019. [191] Y.Cao,Y.Wu,P.Zhang,W.Liang,andM.Li,“Pixel-wisepolsarimage classiﬁcation via a novel complex-valued deep fully convolutional network,” Remote Sensing, vol. 11, no. 22, p. 2653, 2019. [192] T. Ross, S. Worrell, V. Velten, J. Mossing, and M. Bryant, “Standard SAR ATR evaluation experiments using the MSTAR public release data set,” in Algorithms for Synthetic Aperture Radar Imagery, 1998. [193] F. Gao, Y. Yang, J. Wang, J. Sun, E. Yang, and H. Zhou, “A deep convolutional generative adversarial networks (dcgans)-based semisupervised method for object recognition in synthetic aperture radar (sar) images,” Remote Sensing, vol. 10, no. 6, p. 846, 2018. [194] B. Li, B. Liu, L. Huang, W. Guo, Z. Zhang, and W. Yu, “OpenSARShip 2.0: A large-volume dataset for deeper interpretation of ship targets in Sentinel-1 imagery,” in 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA). Beijing: IEEE, Nov. 2017, pp. 1–5. [Online]. Available: http://ieeexplore.ieee.org/document/8124929/ [195] L. Huang, B. Liu, B. Li, W. Guo, W. Yu, Z. Zhang, and W. Yu, “OpenSARShip:ADatasetDedicatedtoSentinel-1ShipInterpretation,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 11, no. 1, pp. 195–208, Jan. 2018, conference Name: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. [196] Y. Wang, C. Wang, H. Zhang, Y. Dong, and S. Wei, “A SAR Dataset of Ship Detection for Deep Learning under Complex Backgrounds,” Remote Sensing, vol. 11, no. 7, p. 765, Mar. 2019. [Online]. Available: https://www.mdpi.com/2072-4292/11/7/765 [197] Y. Wang, X. X. Zhu, B. Zeisl, and M. Pollefeys, “Fusing MeterResolution 4-D InSAR Point Clouds and Optical Images for Semantic Urban Infrastructure Monitoring,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 1, pp. 14–26, Jan. 2017, 00002. [198] I. D. Stewart and T. R. Oke, “Local climate zones for urban temperature studies,” Bulletin of the American Meteorological Society, vol. 93, no. 12, pp. 1879–1900, 2012. [Online]. Available: http://journals.ametsoc.org/doi/abs/10.1175/BAMS-D-11-00019.1 [199] H. Xiyue, A. Wei, S. Qian, L. Jian, W. Haipeng, and X. Feng, “Fusarship: a high-resolution sar-ais matchup dataset of gaofen-3 for ship detection and recognition,” SCIENCE CHINA Information Sciences, 2020. [200] P. Yu, A. Qin, and D. Clausi, “Unsupervised polarimetric SAR image segmentation and classiﬁcation using region growing with edge penalty,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 4, pp. 1302–1317, 2012.
[201] D. Hoekman and M. Vissers, “A new polarimetric classiﬁcation approach evaluated for agricultural crops,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 12, pp. 2881–2889, 2003. [202] W. Yang, D. Dai, J. Wu, and C. He, Weakly supervised polarimetric SAR image classiﬁcation with multi-modal Markov aspect model. ISPRS, 2010. [203] C. O. Dumitru, G. Schwarz, and M. Datcu, “SAR Image Land Cover Datasets for Classiﬁcation Benchmarking of Temporal Changes,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 11, no. 5, pp. 1571–1592, May 2018, conference Name: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. [204] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila, “Noise2noise: Learning image restoration without clean data,” 2018. [205] A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” 2017. [206] X. Chen, J. Liu, Z. Wang, and W. Yin, “Theoretical linear convergence of unfolded ista and its practical weights and thresholds,” 2018. [207] S. Haykin, “Cognitive radar: a way of the future,” IEEE Signal Processing Magazine, vol. 23, no. 1, pp. 30–40, 2006.

1https://doi.org/10.14459/2018mp1483140 2https://www.tensorﬂow.org/datasets/catalog/so2sat 3https://doi.org/10.21227/3sz0-dp26 4https://mediatum.ub.tum.de/1474000 5https://spacenet.ai/sn6-challenge/ 6https://www.ietr.fr/polsarpro-bio/san-francisco/ 7https://github.com/liuxuvip/PolSF 8https://www.sdms.afrl.af.mil/index.php?collection=mstar 9http://opensar.sjtu.edu.cn/Data/Search 10https://github.com/CAESAR-Radi/SAR-Ship-Dataset 11https://www.sipeo.bgu.tum.de/downloads/SARptical data.zip 12https://mediatum.ub.tum.de/1436631