With the rapid advancement of remote sensing technology, the availability of image data has become increasingly abundant. The extraction of regions of interest (ROI) based on deep learning has emerged as a crucial research direction in remote sensing image processing, providing vital technical support for disaster monitoring, urban planning, military reconnaissance, and other fields. Currently, the "Deep Learning + Massive Remote Sensing Data + Strong Annotation" model has achieved remarkable success in ROI extraction from remote sensing images. However, due to its heavy reliance on pixel-level precise annotations, several prominent issues arise: Firstly, the strong annotation model requires manual precise annotation of every pixel, which is not only time-consuming and labor-intensive but also costly. Secondly, dealing with multi-source high-resolution remote sensing images poses additional challenges due to the complexity and variability of scenes. Precise annotation requires extensive domain expertise, and the difficulty and workload of pixel-level annotation increase significantly. In recognition of these issues with the strong annotation model, some scholars have shifted their focus towards weak annotation models.
Currently, the commonly used weak annotation model is image-level annotation, which establishes a correspondence between target categories and image pixels by dynamically inferring the category labels of image pixels using pseudo-labels as a bridge. Compared to the strong annotation model, the weak annotation model can significantly reduce annotation costs and improve annotation efficiency. However, due to its relatively late start, there are still numerous issues that need to be addressed.
1) From the perspective of sample annotation, weak annotations provide limited guidance and supervisory information, and there is a tendency for mislabeling and missed labeling during the annotation process, which weakens the decision-making capabilities of ROI extraction algorithms.
2) From the perspective of input data, the quality of remote sensing images is susceptible to weather conditions, leading to decreased image contrast, blurred boundaries of target areas, and severe constraints on the accuracy of ROI extraction under weak annotation conditions.
3) From the perspective of algorithm performance, deep learning models under weak annotation conditions often lack expressive power and usually require cumbersome post-processing steps, resulting in low testing and output efficiency.
Saliency analysis is a theoretical method proposed by mimicking the visual attention mechanism of the human eye. It can quickly select several salient regions in the scene for priority interpretation, thus avoiding time-consuming full-image analysis of the entire image. It is an effective means of rapidly extracting important targets from images. Combining saliency analysis with deep learning can effectively enhance the learning and expression capabilities of the network, helping to achieve accurate extraction of regions of interest in remote sensing images under weak annotation conditions. This is of great significance for alleviating the contradiction between high-speed acquisition and low-speed interpretation of remote sensing images.
To address the aforementioned issues, this paper starts by gradually improving the quality of labels, combining saliency analysis and semantic feature perception. Different staged processing models are designed for three types of dataset conditions: inexact annotation, inaccurate annotation, and incomplete annotation. Accurate and efficient ROI extraction from remote sensing images under weak annotation conditions is achieved, effectively improving testing and output efficiency. The main research work of this paper includes:
1) To address the issue of low algorithm output efficiency under inexact annotation conditions, a method for extracting regions of interest from remote sensing images based on cross-perception of prominent features is proposed. This method mainly consists of two parts: initial pixel-level pseudo-label saliency map generation and final semantic collaborative segmentation. In the first stage, a binary classification network is constructed, and a saliency map calculation method based on multi-layer category feature attention is designed to generate pixel-level pseudo-labels, achieving a leap from image-level annotation to pixel-level annotation and providing a foundation for subsequent ROI extraction. In the second stage, a ROI extraction network based on a double-branch nested U-Net is constructed. This network combines multiple residual modules, utilizes the input convolutional layer to extract local features, reduces detail loss through a symmetrical encoder-decoder structure, and fuses information from different scales through a multi-scale feature fusion module. Additionally, a cross-perception module for prominent features is embedded in the network. The design of this module enables the model to fully perceive the cross-feature information between images, thus achieving collaborative segmentation of ROIs for image pairs. This method aims to improve the accuracy of semantic segmentation in remote sensing images, reduce detail loss, and enhance the feature extraction capabilities of the network.
2) To address the problem of input data being susceptible to weather conditions and deteriorating quality, we propose a method for extracting regions of interest from fog-covered remote sensing images based on comparative analysis of salient features. The method mainly consists of three parts: generating initial pseudo-labels, correcting pseudo-labels based on comparative analysis of salient features, and extracting regions of interest from fog-covered remote sensing images based on domain adaptation. In the first part, we construct a classification network and use class feature perception and multi-layer feature fusion methods to obtain initial pixel-level pseudo-labels. In the second part, we design a salient feature comparison similarity sorting algorithm to compare local features between images, reduce the influence of haze noise, and correct the initial pixel-level pseudo-labels. In the third part, we propose an unsupervised domain adaptation method to adjust the entropy distribution of the target domain to make it similar to that of the source domain, thereby indirectly achieving entropy minimization, allowing the network to simultaneously adapt to clear images and fog-covered images, improving the generalization ability and robustness of the model.
3) To address the problem of inaccurate labeling that weakens the decision-making ability of the remote sensing image ROI extraction model, a method for extracting regions of interest from remote sensing images based on label noise cleaning and significant feature uncertainty analysis is proposed. The method mainly consists of two parts: label noise cleaning and pixel-level pseudo label generation, and region of interest extraction based on significant feature uncertainty analysis. In the first part, initial pseudo labels are generated through iterative learning and noise cleaning, and then a multi-layer category feature attention initial pseudo label saliency map is generated, while using superpixel segmentation to preserve image edge details. In the second part, a complementary significant feature uncertainty perception algorithm is proposed for the complexity and diversity of remote sensing image features. The method considers the complexity of features between different ground objects in remote sensing images, integrates the uncertainty of model prediction and the complementarity of significant features between data sets, and introduces uncertainty perception strategies to improve the utilization of features and enhance the final region of interest extraction results.
4) In response to the problem of incomplete and low-precision ROI annotation for remote sensing images, a semi-supervised region of interest extraction method based on multi-cue fusion of salient features is proposed. The method mainly consists of two parts: the first part is multi-cue salient analysis for remote sensing images, aiming to locate target regions using visual saliency algorithms for a subset of remote sensing images and generate pixel-level pseudo labels. The second part is semi-supervised region of interest extraction based on adaptive consistency, aiming to combine adaptive and semi-supervised learning to complete the training of region of interest extraction model under incomplete annotation conditions. This method aims to improve the efficiency and accuracy of region of interest extraction for remote sensing images, especially in the face of incomplete annotation datasets, effectively alleviating the problem of scarce annotation data.
The above work is an important attempt to combine significance analysis with new theories and methods of deep learning, which will provide a new research idea for the underlying computer vision research based on deep learning. The project results have important application value for information analysis and extraction in the fields of remote sensing, medicine, astronomy, and other fields.