查看论文信息

查看全文

查看论文信息

中文题名：	基于互联网数据爬取技术的全国主要城市PM2.5时序分析与预测
姓名：	邓隽宏
保密级别：	公开
论文语种：	chi
学科代码：	070504
学科专业：	地理信息科学
学生类型：	学士
学位：	理学学士
学位年度：	2023
校区：	珠海校区培养
学院：	知行书院
第一导师姓名：	赵文智
第一导师单位：	地理科学学部
提交日期：	2023-05-18
答辩日期：	2023-05-16
外文题名：	Time Series Analysis and Prediction of PM2.5 in Major Cities across China Based on Internet Data Crawling Technology
中文关键词：	机器学习 ; 互联网数据爬取技术 ; 长短期记忆网络 ; PM2.5预测 ; python
外文关键词：	Machine Learning ; Web crawler ; LSTM ; PM2.5 forecast ; python
中文摘要：	︿在现如今全球互联网信息技术的高速迭代下，互联网数据爬取技术是高效获取互联网合法数据的技术手段。近年来，国家重点关注大气环境污染物，人们也越发重视大气环境。因此，收集空气质量数据，利用机器学习长短期记忆网络模型对PM_2.5进行时序分析与预测是热门研究课题。本文将重点研究互联网数据爬取技术，用程序收集2020年至2023年空气污染物数据集，提取PM_2.5时序数据。研讨机器学习的不同的模型，如循环神经网络(RNN)、长短期记忆网络模型(LSTM)。基于LSTM长短期记忆网络，训练并测试数据集，对数据归一化后进行时序分析和预测，预测结果在不同的模型参数下作结果对比和时序分析。结论：一是发现模型迭代次数并不会影响模型预测精度，选取适合数据集的模型迭代值以优化模型效率；二是由于时间序列的数据具有自相关性，LSTM模型预测结果出现预测值滞后的现象；三是PM_2.5数据与温度、气压、大气结构呈正相关，与降雨量、风速、呈负相关，以上要素的相关系数季节性显著。﹀
外文摘要：	︿ In today's high-speed iteration of global Internet information technology, Internet data crawling technology is a technical means to efficiently obtain legitimate data on the Internet. In recent years, the country has focused on air environment pollutants and people are paying more and more attention to the air environment. Therefore, collecting air quality data and using machine learning long and short-term memory network models to analyse and predict PM2.5 in a time-series is a popular research topic. This paper will focus on internet data crawling techniques to collect air pollutant datasets using programs to extract PM2.5 time-series data. Different models of machine learning, such as Recurrent Neural Network (RNN), Long Short Term Memory Network Model (LSTM), are examined. Based on the LSTM long and short-term memory network, the dataset is trained and tested, the data is normalised for temporal analysis and prediction, and the predictions are compared for results and temporal analysis under different model parameters. Conclusions: Firstly, it was found that the number of model iterations does not affect the model prediction accuracy, and the model iteration value suitable for the dataset was selected to optimise the model efficiency; secondly, due to the autocorrelation of time series data, the LSTM model prediction results showed a lag in prediction values; thirdly, the PM2.5 data were positively correlated with temperature, air pressure and atmospheric structure, and negatively correlated with rainfall, wind speed, and the above elements were The correlation coefficients of the above elements are seasonally significant. ﹀
参考文献总数：	17
馆藏号：	本070504/23022Z
开放日期：	2024-05-22

附件下载