- 无标题文档
查看论文信息

中文题名:

 基于不同方法的多变量高频气温预测比较研究    

姓名:

 刘柯兰    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 025200    

学科专业:

 应用统计    

学生类型:

 硕士    

学位:

 应用统计硕士    

学位类型:

 专业学位    

学位年度:

 2024    

校区:

 珠海校区培养    

学院:

 统计学院    

研究方向:

 数据科学与管理    

第一导师姓名:

 石峻驿    

第一导师单位:

 统计学院    

提交日期:

 2024-06-14    

答辩日期:

 2024-05-25    

外文题名:

 A COMPARATIVE STUDY OF MULTIVARIATE HIGH-FREQUENCY TEMPERATURE FORECASTS BASED ON DIFFERENT METHODS    

中文关键词:

 气温预测 ; 时序数据 ; 高频 ; Prophet模型 ; LSTM模型    

外文关键词:

 Temperature prediction ; Time-series data ; High-frequency ; Prophet model ; LSTM model    

中文摘要:

在全球气候变暖、极端天气频发的大背景下,准确地预测气象变化可以极大地便利人们的生活。而气温作为一个非常重要的气象因素,实现气温的高频预测具有非常直接的现实意义,例如高频气温预测可以用于实时监测和预测极端高温事件,为灾害预警和应对提供关键信息;还可以为农民提供实时的种植建议,包括选择适宜的播种时间、调整作物品种等,以应对气候变化带来的挑战等。气温的变化对各行各业都有非常重要的影响。
但由于气温变化受到如大环流、海洋活动、地形地貌等多种因素的影响,高频气温预测依旧是一项复杂而具有挑战的任务。通过比较研究不同方法在高频气温预测中的效果,可以帮助我们更深入地理解各种方法的优缺点,为实际预测工作提供更为可靠的理论支持。
故本文主要基于不同方法对多变量高频气温数据进行预测研究,旨在通过对气象数据内在信息进行挖掘以及模型对比分析找到适合进行高频气温预测且具有良好应用能力的模型,具体工作如下:
(1)对德国耶拿气象数据集进行处理和分析,包括数据预处理、数据可视化以及特征变量的筛选和构造。本文首先对耶拿气象数据集数据进行重复、缺失、异常以及格式转换等预处理操作;接着对多变量气象数据的变化趋势以及变量相关性等内容进行可视化分析,采用折线图、热力图、小提琴图等多种可视化方式来挖掘变量的潜在特征。在特征筛选和构造上,本文一方面基于XGBoost对特征重要性排序来进行气象特征筛选;另一方面将气温数据基于STL进行时序拆解,再分别利用AR和Prophet拟合不同部分构造关于趋势和季节的时序特征,最后将数据所有特征进行整合。
(2)基于处理的德国耶拿气象数据集,利用不同模型进行高频气温预测并分析预测效果。本文主要采用了Prophet、XGBoost、单变量和多变量LSTM、以及Prophet_LSTM组合模型等多种模型来进行预测,同时针对XGBoost模型还区分是否加入新构造的特征,通过比较不同模型预测结果的MAE、MSE、RMSE等拟合效果来分析不同模型对于高频气温的预测效果。最后预测结果表明单变量LSTM和加入新构造变量的XGBoost对于耶拿地区的高频气温预测效果较好。
(3)进一步探究单变量LSTM及多变量XGBoost模型在其他高频气象数据集上的预测效果,基于ECMWF上搜集的北京大兴区2023年的逐时气象数据进行实验预测。经过数据预处理及分析等操作后,本文通过实验预测结果发现单变量LSTM与多变量XGBoost模型在北京市的气温数据上仍然具有良好的预测效果,但LSTM模型对于北京市的逐时气温预测效果更好、应用能力更强,可以为将来其他高频数据集的预测分析提供借鉴意义。

外文摘要:

Against the background of global warming and frequent occurrence of extreme weather, accurate prediction of meteorological changes can greatly facilitate people's lives. As a very important meteorological factor, the realization of high-frequency prediction of temperature has very direct practical significance. For example, high-frequency temperature prediction can be used for real-time monitoring and prediction of extreme high temperature events, which can provide key information for disaster warning and response; it can also provide real-time planting advice for farmers, including choosing appropriate sowing time and adjusting crop varieties to meet the challenges brought by climate change, and so on. Changes in temperature have a very important impact on all sectors.

However, high-frequency temperature prediction remains a complex and challenging task because temperature changes are affected by various factors such as general circulation, oceanic activities, topography and geomorphology. Comparative study of the effects of different methods in high-frequency temperature prediction can help us understand the advantages and disadvantages of various methods more deeply and provide more reliable theoretical support for the actual prediction work.

Therefore, this paper mainly focuses on the prediction of multivariate high-frequency temperature data based on different methods, aiming to find a model suitable for high-frequency temperature prediction with good application ability by mining the intrinsic information of meteorological data and analyzing the model comparison as follows:

(1) Processing and analyzing the Jena meteorological dataset, including data preprocessing, data visualization, and screening and construction of characteristic variables. In this paper, the Jena meteorological dataset is first pre-processed with duplicates, missing, anomalies, and format conversion; then the trend of multivariate meteorological data and the correlation of the variables are visualized and analyzed, and the potential features of the variables are mined by using line graphs, heat diagrams, violin diagrams, and other visualization methods. In terms of feature screening and construction, this paper, on the one hand, carries out meteorological feature screening based on XGBoost to rank the importance of features; on the other hand, the temperature data are disassembled in time series based on STL, and then AR and Prophet are utilized to fit the different parts of the construction of the time series features about the trend and seasons respectively, and finally all the features of the data are integrated.

(2) Based on the processed Jena, Germany meteorological dataset, different models are utilized for high-frequency temperature prediction and the prediction effect is analyzed. In this paper, Prophet, XGBoost, univariate and multivariate LSTM, and Prophet_LSTM combination models are mainly used for prediction, and the XGBoost model also distinguishes whether to add the newly constructed features or not, and analyzes the prediction effect of different models for high-frequency air temperature by comparing the fitting effects of MAE, MSE, and RMSE of different models' prediction results. The prediction effect of different models for high-frequency temperature is analyzed by comparing the fitting effects of MAE, MSE and RMSE of different models. Finally, the prediction results show that univariate LSTM and XGBoost with neotectonic variables are more effective in predicting high-frequency temperatures in Jena.

(3) To further investigate the prediction effects of univariate LSTM and multivariate XGBoost models on other high-frequency meteorological datasets, experimental predictions are made based on the hour-by-hour meteorological data of Daxing District, Beijing, for the year of 2023, which are collected from the ECMWF. After data preprocessing and analysis, this paper finds that the univariate LSTM and multivariate XGBoost models still have good prediction effects on the temperature data of Beijing, but the LSTM model is more effective in predicting the hour-by-hour temperature of Beijing, and it can be applied more effectively, which can be used as a reference for the prediction of other high-frequency datasets in the future.

参考文献总数:

 43    

馆藏地:

 总馆B301    

馆藏号:

 硕025200/24032Z    

开放日期:

 2025-06-14    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式