查看论文信息

查看全文

查看论文信息

中文题名：	基于神经网络与注意力机制的农作物遥感分类研究——以友谊县主要农作物为例
姓名：	曲腾飞
保密级别：	公开
论文语种：	chi
学科代码：	081602
学科专业：	摄影测量与遥感
学生类型：	硕士
学位：	工学硕士
学位类型：	学术学位
学位年度：	2024
校区：	北京校区培养
学院：	地理科学学部
研究方向：	农业遥感
第一导师姓名：	王宏
第一导师单位：	地理科学学部
提交日期：	2024-05-29
答辩日期：	2024-05-26
外文题名：	Research on Crop Remote Sensing Classification Based on Neural Network and Attention Mechanisms: A Case Study of the Main Crops of Youyi County
中文关键词：	农作物分类 ; Sentinel-2 ; 卷积神经网络 ; 混合神经网络 ; 注意力机制 ; 特征优选
外文关键词：	Crop classification ; Sentinel-2 ; Convolutional neural network ; Hybrid neural network ; Attention mechanism ; Feature selection
中文摘要：	︿中国作为全球重要的农业大国，在农业生产领域扮演着关键角色。在这一背景下，准确掌握农作物的播种面积和产量信息显得尤为重要，这些信息不仅是确保国家粮食安全的基石，也是推动农业现代化进程中不可或缺的一环。在获取农作物种植面积和产量信息过程中，农作物分类与识别发挥了至关重要的作用。由于传统农作物遥感分类高度依赖先验知识和人工干预，成本和复杂度较大，深度学习开始被应用于农作物分类。然而，当前应用中能够深层次挖掘空间-光谱-时相特征的模型较少，且注意力机制在模型中的优化不够完善。为解决以上不足，本研究以黑龙江省双鸭山市友谊县为研究区，以2022年5月9日（T1时相）、8月17日（T2时相）、9月9日（T3时相）和9月29日（T4时相）四个时相的Sentinel-2遥感影像数据及地面调查数据为基础，首先采用随机森林-SHAP算法对计算得到的168个辅助特征进行了特征优选，并制作了光谱波段（数据集1）、光谱波段+辅助特征（数据集2）、光谱波段+优选辅助特征（数据集3）三套分类数据集，其次面向单时相和多时相遥感影像数据开发了两种联合注意力机制的神经网络模型CANet（Convolutional Attention Network）和CTANet（Convolutional Temporal Attention Network），充分挖掘了研究区内水稻、玉米和大豆的深层次遥感特征，最后实现了主要农作物的精准识别以及空间分布制图。在面向单时相影像农作物分类的神经网络模型构建研究中，提出了一种结合瓶颈残差模块、残差卷积模块、多分辨率融合模块、链式残差池化模块和注意力模块的卷积注意力网络CANet。通过与当前主流语义分割模型（UNet、DeepLabv3+、RefineNet和FCN）在不同时相的分类结果对比发现，五个分类模型均在T1时相保持较低精度，而在T3时相保持较高的精度，其中CANet对三种作物的识别精度最高，F1分数和IoU分别达到了92%以上和85%以上，性能发挥稳定。所有模型在T2时相与T4时相整体性能表现相近，均较T1时相有所提升，CANet在T2时相对玉米的识别中表现最佳，在T4时相对水稻、玉米和背景的分类上表现最佳。在面向多时相影像农作物分类的神经网络模型构建研究中，提出了一种适用于多时相农作物遥感分类且能够深层次挖掘农作物的空间、光谱和时相三维度特征的混合神经网络CTANet。通过与CANet以及当前主流语义分割模型（ConvLSTM、UNet、DeepLabv3+和RefineNet）在不同分类数据集上的分类结果对比发现，六个分类模型均在数据集2中表现较差，在数据集3中发挥了最高性能，其中，CTANet表现最为出色，总体精度和MIoU高达93.91%和87.49%，在水稻、玉米和大豆的识别上，F1分数分别达到了95.64%、95.68%和94.69%，IoU分别达到了91.64%、91.72%和89.91%，相比于其他模型对于水稻和背景（其他）的识别最为准确。在CTANet的消融实验中，当综合使用空间、通道和时相注意力机制时，模型提升的性能超越了基准模型以及仅结合任意单个机制或任意两种机制的效果。这一现象在三套分类数据集中均得到了验证，有力证明了在作物分类任务中，充分挖掘和利用空间、光谱和时相三维度特征对于提高分类精度具有重要意义。除此之外，研究还发现当采用相同分类数据集时，CANet能够有效兼顾学习效率和泛化能力，且在模型训练初期精度提升显著，然而训练后期受特征饱和等因素的影响，精度提升受限。CTANet通过卷积注意力架构和时相注意力架构，能够深层次挖掘农作物的空间、光谱和时相三维度特征，在处理多时相分类任务具有高稳定性和优异性能。然而，CTANet的高性能伴随着更多的训练参数和较长的训练时间，这对硬件提出了更高的要求。﹀
外文摘要：	︿ China, as a major global agricultural powerhouse, plays a pivotal role in agricultural production. In this context, accurately grasping information on crop planting areas and yields is crucial. Such information forms the foundation for ensuring national food security and is an indispensable part of promoting agricultural modernization. In obtaining crop planting area and yield information, crop classification and recognition are essential. Traditional remote sensing classification of crops heavily relies on prior knowledge and manual intervention, which is costly and complex. Therefore, deep learning has begun to be applied in crop classification. However, in current applications, there are few models that can deeply explore spatial-spectral-temporal features, and the optimization of attention mechanisms within these models is not sufficiently advanced. To address these deficiencies, this study focuses on Youyi County in Shuangyashan City, Heilongjiang Province, using Sentinel-2 remote sensing imagery data from four-time phases: May 9, 2022 (T1), August 17, 2022 (T2), September 9, 2022 (T3), and September 29, 2022 (T4), along with ground survey data. First, the random forest-SHAP algorithm was used to select features from 168 auxiliary features, creating three classification datasets: spectral bands (Dataset 1), spectral bands + auxiliary features (Dataset 2), and spectral bands + selected auxiliary features (Dataset 3). Then, two neural network models with joint attention mechanisms, CANet (Convolutional Attention Network) and CTANet (Convolutional Temporal Attention Network), were developed for single-phase and multi-phase remote sensing imagery data, respectively, to fully extract deep remote sensing features of rice, maize, and soybeans in the study area. Finally, precise identification and spatial distribution mapping of the main crops were achieved. In the research on constructing neural network models for single-phase imagery crop classification, a convolutional attention network, CANet, was proposed, combining bottleneck residual units, residual convolution units, multi-resolution fusion units, chain residual pooling units, and attention mechanisms. Compared with current mainstream semantic segmentation models (UNet, DeepLabv3+, RefineNet, and FCN), all five models maintained low accuracy in the T1 phase and high accuracy in the T3 phase. Among them, CANet achieved the highest recognition accuracy for the three crops, with F1 scores and IoU reaching over 92% and 85%, respectively, demonstrating stable performance. All models showed similar overall performance in the T2 and T4 phases, with improvements over the T1 phase. CANet performed best in identifying maize in the T2 phase and in classifying rice, maize, and background in the T4 phase. In the research on constructing neural network models for multi-phase imagery crop classification, a hybrid neural network, CTANet, was proposed, capable of deeply extracting the three-dimensional (spatial, spectral, and temporal) features of crops for multi-phase crop remote sensing classification. Compared with CANet and current mainstream semantic segmentation models (ConvLSTM, UNet, DeepLabv3+, and RefineNet) on different classification datasets, all six models performed poorly on Dataset 2 and best on Dataset 3. CTANet stood out, achieving an overall accuracy and MIoU of 93.91% and 87.49%, respectively. For rice, maize, and soybeans, CTANet's F1 scores were 95.64%, 95.68%, and 94.69%, and IoU were 91.64%, 91.72%, and 89.91%, respectively. CTANet was the most accurate in identifying rice and background (others). In CTANet's ablation experiments, the comprehensive use of spatial, channel, and temporal attention mechanisms outperformed the baseline model and any combination of any single or two mechanisms. This phenomenon was validated across all three classification datasets, strongly demonstrating the importance of fully extracting and utilizing spatial, spectral, and temporal three-dimensional features in crop classification tasks to improve classification accuracy. Moreover, the research also found that when using the same classification dataset, CANet could effectively balance between learning efficiency and generalization ability, with significant accuracy improvements in the early stages of training. However, due to feature saturation and other factors, accuracy improvements were limited in the later stages of training. CTANet, through its convolutional attention architecture and temporal attention architecture, could deeply extract the three-dimensional features of crops, exhibiting high stability and superior performance in handling multi-phase classification tasks. However, CTANet's high performance came with more training parameters and longer training times, posing higher hardware requirements. ﹀
参考文献总数：	128
馆藏号：	硕081602/24013
开放日期：	2025-05-30

附件下载