TPAMI-2024-卫星遥感图像相关论文 8篇
On Boundary Discontinuity in Angle Regression Based Arbitrary Oriented Object Detection
文章解读: http://www.studyai.com/xueshu/paper/detail/0aac21de6b
文章链接: (10.1109/TPAMI.2024.3378777)
摘要
With vigorous development e.g., in autonomous driving and remote sensing, oriented object detection has gradually been featured.
The majority of existing methods directly perform regression on the rotation angle, which we argue has fundamental limitations of boundary discontinuity (even if using Gaussian or RotatedIoU-based losses).
In this paper, a novel angle coder named phase-shifting coder (PSC) is proposed to address this issue.
Different from another well-explored alternative i.e., angle classification, PSC achieves boundary-discontinuity-free in a continuous and differentiable manner and thus can work together with Gaussian or RotatedIoU-based methods to further boost their performance.
Moreover, by rethinking the boundary discontinuity of elongated and square-like objects as rotational symmetry of different cycles, a dual-frequency version (PSCD) is proposed to accurately predict the orientation of both types of objects.
Visual analysis and extensive experiments on several popular backbone detectors and datasets demonstrate the effectiveness and the potentiality of our approach.
When facing scenarios requiring high-quality bounding boxes, the proposed methods are expected to give a competitive performance…
Box2Mask: Box-Supervised Instance Segmentation via Level-Set Evolution
文章解读: http://www.studyai.com/xueshu/paper/detail/31db0fea0b
文章链接: (10.1109/TPAMI.2024.3363054)
摘要
In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention.
This paper presents a novel single-shot instance segmentation approach, namely Box2Mask, which integrates the classical level-set evolution model into deep neural network learning to achieve accurate mask prediction with only bounding box supervision.
Specifically, both the input image and its deep features are employed to evolve the level-set curves implicitly, and a local consistency module based on a pixel affinity kernel is used to mine the local context and spatial relations.
Two types of single-stage frameworks, i.e., CNN-based and transformer-based frameworks, are developed to empower the level-set evolution for box-supervised instance segmentation, and each framework consists of three essential components: instance-aware decoder, box-level matching assignment and level-set evolution.
By minimizing the level-set energy function, the mask map of each instance can be iteratively optimized within its bounding box annotation.
The experimental results on five challenging testbeds, covering general scenes, remote sensing, medical and scene text images, demonstrate the outstanding performance of our proposed Box2Mask approach for box-supervised instance segmentation.
In particular, with the Swin-Transformer large backbone, our Box2Mask obtains 42.4% mask AP on COCO, which is on par with the recently developed fully mask-supervised methods…
Shadow Detection in Remote Sensing Images Based on Spectral Radiance Separability Enhancement
文章解读: http://www.studyai.com/xueshu/paper/detail/3c966b2c1f
文章链接: (10.1109/TPAMI.2023.3343728)
摘要
Shadow detection is a basic task of remote sensing image analysis, but it is often seriously disturbed by vegetation, water bodies, and black objects.
It is observed that vegetation and dark objects often show a dark look in visible bands but brighter in the near-infrared (NIR), and is also noticed that the reflection of inland water bodies in the green band is stronger than that in the blue band.
Taking advantage of these physical properties and combining them with the bluish and dark appearance of shadows, we propose a simple but effective shadow detection method for multispectral remote sensing images.
These physical properties are used to create transformation models that suppress features such as vegetation, water bodies, etc., but at the same time enhance shadows.
Then, we transform the shadow representation into a color space to generate candidate shadows using dominant color components.
To separate shadows from the others, we propose two indexes, the normalized Color Difference Composite Index (CDCI) and Color Purity Index (CPI), and fuse them to achieve shadows and their confidence.
The experimental results indicate that the proposed method can effectively detect the shadows in multispectral images and outperforms the state-of-the-art approaches…
SpectralGPT: Spectral Remote Sensing Foundation Model
文章解读: http://www.studyai.com/xueshu/paper/detail/3cebba00bf
文章链接: (10.1109/TPAMI.2024.3362475)
摘要
The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner.
While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications.
To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT).
Compared to existing foundation models, SpectralGPT 1) accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS Big Data; 2) leverages 3D token generation for spatial-spectral coupling; 3) captures spectrally sequential patterns via multi-target reconstruction; and 4) trains on one million spectral RS images, yielding models with over 600 million parameters.
Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS Big Data applications within the field of geoscience across four downstream tasks: single/multi-label scene classification, semantic segmentation, and change detection.
A Fast Alpha-Tree Algorithm for Extreme Dynamic Range Pixel Dissimilarities
文章解读: http://www.studyai.com/xueshu/paper/detail/47373e5b37
文章链接: (10.1109/TPAMI.2023.3341721)
摘要
The
α
\alpha
αα-tree algorithm is a useful hierarchical representation technique which facilitates comprehension of images such as remote sensing and medical images.
Most
α
\alpha
αα-tree algorithms make use of priority queues to process image edges in a correct order, but because traditional priority queues are inefficient in
α
\alpha
αα-tree algorithms using extreme-dynamic-range pixel dissimilarities, they run slower compared with other related algorithms such as component tree.
In this paper, we propose a novel hierarchical heap priority queue algorithm that can process
α
\alpha
αα-tree edges much more efficiently than other state-of-the-art priority queues.
Experimental results using 48-bit Sentinel-2 A remotely sensed images and randomly generated images have shown that the proposed hierarchical heap priority queue improved the timings of the flooding
α
\alpha
αα-tree algorithm by replacing the heap priority queue with the proposed queue: 1.68 times in 4-N and 2.41 times in 8-N on Sentinel-2 A images, and 2.56 times and 4.43 times on randomly generated images…
Vehicle Perception From Satellite
文章解读: http://www.studyai.com/xueshu/paper/detail/8fd4a53cdf
文章链接: (10.1109/TPAMI.2023.3335953)
摘要
Satellites are capable of capturing high-resolution videos.
It makes vehicle perception from satellite become possible.
Compared to street surveillance, drive recorder or other equipments, satellite videos provide a much broader city-scale view, so that the global dynamic scene of the traffic are captured and displayed.
Traffic monitoring from satellite is a new task with great potential applications, including traffic jams prediction, path planning, vehicle dispatching, etc.
Practically, limited by the resolution and view, the captured vehicles are very tiny (a few pixels) and move slowly.
Worse still, these satellites are in Low Earth Orbit (LEO) to capture such high-resolution videos, so the background is also moving.
Under this circumstance, traffic monitoring from the satellite view is an extremely challenging task.
To attract more researchers into this field, we build a large-scale benchmark for traffic monitoring from satellite.
It supports several tasks, including tiny object detection, counting and density estimation.
The dataset is constructed based on 12 satellite videos and 14 synthetic videos recorded from GTA-V.
They are separated into 408 video clips, which contain 7,336 real satellite images and 1,960 synthetic images.
128,801 vehicles are annotated totally, and the number of vehicles in each image varies from 0 to 101.
Several classic and state-of-the-art approaches in traditional computer vision are evaluated on the datasets, so as to compare the performance of different approaches, analyze the challenges in this task, and discuss the future prospects…
Incorporating Season and Solar Specificity Into Renderings Made by a NeRF Architecture Using Satellite Images
文章解读: http://www.studyai.com/xueshu/paper/detail/9d4d5e99a8
文章链接: (10.1109/TPAMI.2024.3355069)
摘要
As a result of Shadow NeRF and Sat-NeRF, it is possible to take the solar angle into account in a NeRF-based framework for rendering a scene from a novel viewpoint using satellite images for training.
Our work extends those contributions and shows how one can make the renderings season-specific.
Our main challenge was creating a Neural Radiance Field (NeRF) that could render seasonal features independently of viewing angle and solar angle while still being able to render shadows.
We teach our network to render seasonal features by introducing one more input variable — time of the year.
However, the small training datasets typical of satellite imagery can introduce ambiguities in cases where shadows are present in the same location for every image of a particular season.
We add additional terms to the loss function to discourage the network from using seasonal features for accounting for shadows.
We show the performance of our network on eight Areas of Interest containing images captured by the Maxar WorldView-3 satellite.
This evaluation includes tests measuring the ability of our framework to accurately render novel views, generate height maps, predict shadows, and specify seasonal features independently from shadows.
Our ablation studies justify the choices made for network design parameters…
Deep Lossy Plus Residual Coding for Lossless and Near-Lossless Image Compression
文章解读: http://www.studyai.com/xueshu/paper/detail/c272af1112
文章链接: (10.1109/TPAMI.2023.3348486)
摘要
Lossless and near-lossless image compression is of paramount importance to professional users in many technical fields, such as medicine, remote sensing, precision engineering and scientific research.
But despite rapidly growing research interests in learning-based image compression, no published method offers both lossless and near-lossless modes.
In this paper, we propose a unified and powerful deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression.
In the lossless mode, the DLPR coding system first performs lossy compression and then lossless coding of residuals.
We solve the joint lossy and residual compression problem in the approach of VAEs, and add autoregressive context modeling of the residuals to enhance lossless compression performance.
In the near-lossless mode, we quantize the original residuals to satisfy a given ℓ∞ error bound, and propose a scalable near-lossless compression scheme that works for variable ℓ∞ bounds instead of training multiple networks.
To expedite the DLPR coding, we increase the degree of algorithm parallelization by a novel design of coding context, and accelerate the entropy coding with adaptive residual interval.
Experimental results demonstrate that the DLPR coding system achieves both the state-of-the-art lossless and near-lossless image compression performance with competitive coding speed…