IJCAI-2024 卫星遥感图像相关论文8篇
Fusion from a Distributional Perspective: A Unified Symbiotic Diffusion Framework for Any Multisource Remote Sensing Data Classification
文章解读: http://www.studyai.com/xueshu/paper/detail/2016f258e4
文章链接: (10.24963/ijcai.2024/174)
摘要
The joint classification of multisource remote sensing data is a prominent research field.
However, most of the existing works are tailored for two specific data sources, which fail to effectively address the diverse combinations of data sources in practical applications.
The importance of designing a unified network with applicability has been disregarded.
In this paper, we propose a unified and self-supervised Symbiotic Diffusion framework (named SymDiffuser), which achieves the joint classification of any pair of different remote sensing data sources in a single model.
The SymDiffuser captures the inter-modal relationship through establishing reciprocal conditional distributions across diverse sources step by step.
The fusion process of multisource data is consistently represented within the framework from a data distribution perspective.
Subsequently, features under the current conditional distribution at each time step is integrated during the downstream phase to accomplish the classification task.
Such joint classification methodology transcends source-specific considerations, rendering it applicable to remote sensing data from any diverse sources.
The experimental results showcase the framework’s potential in achieving state-of-the-art performance in multimodal fusion classification task…
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
文章解读: http://www.studyai.com/xueshu/paper/detail/292059a9c7
文章链接: (10.24963/ijcai.2024/103)
摘要
Due to spatial redundancy in remote sensing images, sparse tokens containing rich information are usually involved in self-attention (SA) to reduce the overall token numbers within the calculation, avoiding the high computational cost issue in Vision Transformers.
However, such methods usually obtain sparse tokens by hand-crafted or parallel-unfriendly designs, posing a challenge to reach a better balance between efficiency and performance.
Different from them, this paper proposes to use learnable meta tokens to formulate sparse tokens, which effectively learn key information meanwhile improving the inference speed.
Technically, the meta tokens are first initialized from image tokens via cross-attention.
Then, we propose Dual Cross-Attention (DCA) to promote information exchange between image tokens and meta tokens, where they serve as query and key (value) tokens alternatively in a dual-branch structure, significantly reducing the computational complexity compared to self-attention.
By employing DCA in the early stages with dense visual tokens, we obtain the hierarchical architecture LeMeViT with various sizes.
Experimental results in classification and dense prediction tasks show that LeMeViT has a significant 1.7 × speedup, fewer parameters, and competitive performance compared to the baseline models, and achieves a better trade-off between efficiency and performance.
The code is released at https://github.com/ViTAE-Transformer/LeMeViT…
DeepLight: Reconstructing High-Resolution Observations of Nighttime Light With Multi-Modal Remote Sensing Data
文章解读: http://www.studyai.com/xueshu/paper/detail/6aafc4298b
文章链接: (10.24963/ijcai.2024/837)
摘要
Nighttime light (NTL) remote sensing observation serves as a unique proxy for quantitatively assessing progress toward meeting a series of Sustainable Development Goals (SDGs), such as poverty estimation, urban sustainable development, and carbon emission.
However, existing NTL observations often suffer from pervasive degradation and inconsistency, limiting their utility for computing the indicators defined by the SDGs.
In this study, we propose a novel approach to reconstruct high-resolution NTL images using multimodal remote sensing data.
To support this research endeavor, we introduce DeepLightMD, a comprehensive dataset comprising data from five heterogeneous sensors, offering fine spatial resolution and rich spectral information at a national scale.
Additionally, we present DeepLightSR, a calibration-aware method for multi-modality super-resolution.
DeepLightSR integrates calibration-aware alignment, an auxiliary-to-main multi-modality fusion, and an auxiliary-embedded refinement to effectively address spatial heterogeneity, fuse diversely representative features, and enhance performance in ×8 super-resolution tasks.
Extensive experiments demonstrate the superiority of DeepLightSR over 8 competing methods, as evidenced by improvements in PSNR (2.01 dB ~ 13.25 dB) and PIQE (0.49 ~ 9.32).
Our findings underscore the practical significance of our proposed dataset and model in reconstructing high-resolution NTL data, supporting efficiently and quantitatively assessing the SDG progress.
The code and data will be available at https://github.com/xian1234/DeepLight…
MuseCL: Predicting Urban Socioeconomic Indicators via Multi-Semantic Contrastive Learning
文章解读: http://www.studyai.com/xueshu/paper/detail/8a452032a7
文章链接: (10.24963/ijcai.2024/834)
摘要
Predicting socioeconomic indicators within urban regions is crucial for fostering inclusivity, resilience, and sustainability in cities and human settlements.
While pioneering studies have attempted to leverage multi-modal data for socioeconomic prediction, jointly exploring their underlying semantics remains a significant challenge.
To address the gap, this paper introduces a Multi-Semantic Contrastive Learning (MuseCL) framework for fine-grained urban region profiling and socioeconomic prediction.
Within this framework, we initiate the process by constructing contrastive sample pairs for street view and remote sensing images, capitalizing on the similarities in human mobility and Point of Interest (POI) distribution to derive semantic features from the visual modality.
Additionally, we extract semantic insights from POI texts embedded within these regions, employing a pre-trained text encoder.
To merge the acquired visual and textual features, we devise an innovative cross-modality-based attentional fusion module, which leverages a contrastive mechanism for integration.
Experimental results across multiple cities and indicators consistently highlight the superiority of MuseCL, demonstrating an average improvement of 10% in R2 compared to various competitive baseline models.
The code of this work is publicly available at https://github.com/XixianYong/MuseCL…
From Pixels to Progress: Generating Road Network from Satellite Imagery for Socioeconomic Insights in Impoverished Areas
文章解读: http://www.studyai.com/xueshu/paper/detail/91890f326a
文章链接: (10.24963/ijcai.2024/831)
摘要
The Sustainable Development Goals (SDGs) aim to resolve societal challenges, such as eradicating poverty and improving the lives of vulnerable populations in impoverished areas.
Those areas rely on road infrastructure construction to promote accessibility and economic development.
Although publicly available data like OpenStreetMap is available to monitor road status, data completeness in impoverished areas is limited.
Meanwhile, the development of deep learning techniques and satellite imagery shows excellent potential for earth monitoring.
To tackle the challenge of road network assessment in impoverished areas, we develop a systematic road extraction framework combining an encoder-decoder architecture and morphological operations on satellite imagery, offering an integrated workflow for interdisciplinary researchers.
Extensive experiments of road network extraction on real-world data in impoverished regions achieve a 42.7% enhancement in the F1-score over the baseline methods and reconstruct about 80% of the actual roads.
We also propose a comprehensive road network dataset covering approximately 794,178 km2 area and 17.048 million people in 382 impoverished counties in China.
The generated dataset is further utilized to conduct socioeconomic analysis in impoverished counties, showing that road network construction positively impacts regional economic development.
The technical appendix, code, and generated dataset can be found at https://github.com/tsinghua-fib-lab/Road_network_extraction_impoverished_counties…
Long-term Detection and Monitory of Chinese Urban Village Using Satellite Imagery
文章解读: http://www.studyai.com/xueshu/paper/detail/9485b6e327
文章链接: (10.24963/ijcai.2024/813)
摘要
Urban villages are areas filled with rural-like improvised structures in Chinese cities, usually housing the most vulnerable groups.
Under the guidance of the Sustainable Development Goals (SDGs), the Chinese government initiated renewal and redevelopment projects, underscoring the meticulous mapping and segmentation of urban villages.
Satellite imagery is advanced and efficient in identifying urban villages and monitoring changes, but traditional methods neglect the morphological diversity in season, shape, size, spacing, and layout of urban villages, which is not satisfying for long-term wide-range data.
Here, we design a targeted approach based on Tobler’s First Law of Geography, using curriculum labeling to solve morphological diversity and semi-automatically generate segmentation for urban village boundaries.
Specifically, we use manually labeled data as seeds for pre-trained SegFormer models and incrementally fine-tune the model based on geographical proximity.
The rigorous experimentation across five diverse cities substantiates the commendable efficacy of our methodology.
IoU metric demonstrates a noteworthy improvement of over 119% to baseline.
Our final results cover 265,050 urban villages across 433 cities in China over the past 10 years, and the analysis reveals the uneven redevelopment by geography and city scale.
We further examine the within-city distribution and verify the urban scaling law associated with several socio-economic factors.
Our method can be used nationwide to decide redevelopment priority and resource tilt, contributing to SDG 11.1 on affordable housing and upgrading slums.
The code and dataset are available at https://github.com/tsinghua-fib-lab/LtCUV…
EvaNet: Elevation-Guided Flood Extent Mapping on Earth Imagery
文章解读: http://www.studyai.com/xueshu/paper/detail/bbfe493a84
文章链接: (10.24963/ijcai.2024/133)
摘要
Accurate and timely mapping of flood extent from high resolution satellite imagery plays a crucial role in disaster management such as damage assessment and relief activities.
However, current state-of-the-art solutions are based on U-Net, which cannot segment the flood pixels accurately due to the ambiguous pixels (e.g., tree canopies, clouds) that prevent a direct judgement from only the spectral features.
Thanks to the digital elevation model (DEM) data readily available from sources such as United States Geological Survey (USGS), this work explores the use of an elevation map to improve flood extent mapping.
We propose, EvaNet, an elevation-guided segmentation model based on the encoder-decoder architecture with two novel techniques: (1) a loss function encoding the physical law of gravity that if a location is flooded (resp.
dry), then its adjacent locations with a lower (resp.
higher) elevation must also be flooded (resp.
dry); (2) a new (de)convolution operation that integrates the elevation map by a location-sensitive gating mechanism to regulate how much spectral features flow through adjacent layers.
Extensive experiments show that EvaNet significantly outperforms the U-Net baselines, and works as a perfect drop-in replacement for U-Net in existing solutions to flood extent mapping.
EvaNet is open-sourced at https://github.com/MTSami/EvaNet…
Remote Sensing for Water Quality: A Multi-Task, Metadata-Driven Hypernetwork Approach
文章解读: http://www.studyai.com/xueshu/paper/detail/bfc9079aab
文章链接: (10.24963/ijcai.2024/806)
摘要
Inland water quality monitoring is vital for clean water access and aquatic ecosystem management.
Remote sensing machine learning models enable large-scale observations, but are difficult to train due to data scarcity and variability across many lakes.
Multi-task learning approaches enable learning of lake differences by learning multiple lake functions simultaneously.
However, they suffer from a trade-off between parameter efficiency and the ability to model task differences flexibly, and struggle to model many diverse lakes with few samples per task.
We propose Multi-Task Hypernetworks, a novel multi-task learning architecture which circumvents this trade-off using a shared hypernetwork to generate different network weights for each task from small task-specific embeddings.
Our approach stands out from existing works by providing the added capacity to leverage task-level metadata, such as lake depth and temperature, explicitly.
We show empirically that Multi-Task Hypernetworks outperform existing multi-task learning architectures for water quality remote sensing and other tabular data problems, and leverages metadata more effectively than existing methods…