中文题名: | 基于TRMF-LSTM组合模型的城市供水预测分析 |
姓名: | |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 025200 |
学科专业: | |
学生类型: | 硕士 |
学位: | 应用统计硕士 |
学位类型: | |
学位年度: | 2023 |
校区: | |
学院: | |
研究方向: | 经济与金融统计 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2023-06-19 |
答辩日期: | 2023-05-26 |
外文题名: | Prediction and Analysis of Urban Water Supply Based on TRMF-LSTM Combination Model |
中文关键词: | |
外文关键词: | Secondary water supply ; Combination model ; TRMF ; LSTM ; Water use prediction |
中文摘要: |
近年来, 随着高层住户的增多, 供水压力的要求越来越高, 二次供水逐步进入人们生活, 精准的居民用水预测可以减少二次供水因供水过多可能带来的供水积压、水源污染问题与因供水不足可能带来的居民用水不够等问题, 因此本文以深圳市某小区20栋居民楼4个月以来的用水数据为例, 旨在建立合适的模型以实现未来一定时间跨度上的居民用水量预测. 在用水预测问题上, 现有文献研究主要采用基于时间序列模型如ARIMA, 机器学习算法如决策树和深度学习算法如BP神经网络等建立模型进行预测. 时间序列模型适用于数据的线性预测, 机器学习、深度学习大多适用于非线性预测, 然而现实数据存在缺失值、异常值, 且高维数据居多, 时间跨度长, 因此简单的时间序列模型并不适用, 同时单一预测模型存在精度低、预测范围局限等问题, 因此也有文献提出组合模型的思想, 例如结合ARIMA与LSTM模型进行预测, 分别拟合线性部分与非线性部分后进行结合, 在一维时间序列数据上取得满意效果. 然而在面对高维数据时, 简单的时间序列模型不再适用, 因此本文采用TRMF模型替代ARIMA实现高维数据的线性预测, 针对线性预测的残差部分, 使用LSTM拟合残差来进行非线性部分的填充, 实现多变量的同时预测. TRMF模型通过结合VAR算法与MF矩阵分解法实现对高维数据的降维、缺失值填充以及对未来的预测, MF将高维稀疏的原始数据矩阵拆分成低维稠密的时序矩阵, 通过VAR预测时序矩阵来实现原始变量的预测. 但由于VAR前提要求较为苛刻, 而LSTM要求相对简单且具有长时间记忆能力, 所以本文使用LSTM替代VAR实现时序矩阵的预测进而得到TRMF的线性预测结果. 使用Score值进行模型预测效果的评估, Score越接近1预测效果越好. 使用TRMF进行线性拟合, 其中设置LSTM的batch size为168进行时序矩阵的学习, 即学习一个星期的数据来进行下一时刻的预测, TRMF在未来短期(48小时)、中期(120小时)以及长期(336小时)三个时间长度上分别得到0.815, 0.807, 0.782的Score值, 因预测偏低导致用水量不足的占比分别是12.3%, 13.6%, 14.4%, 因预测偏高导致供水积压的占比分别是13.7%, 15.2%, 16.7%. 使用LSTM对残差学习. 设置batch size为168, 得到未来三个时间上的残差预测, 将TRMF线性预测部分与LSTM残差预测部分结合得到预测输出, 在未来三个时间长度上分别得到0.823, 0.811, 0.787的Score值, 因预测偏低导致用水量不足的占比分别是9.1%, 10.7%, 13.1%, 因预测偏高导致供水积压的占比分别是10.9%, 12.5%, 14.6%, 加入LSTM模型后, 减少了约3%用水量不足与供水积压的现象. 对于batch size的设置, 对比batch size为336的组合模型, 发现本文的组合模型除在短期预测Score值略低外, 中期预测、长期预测表现均有一定优势, 设置为168更加合理. 又分别通过去除天气、疫情数据观察模型预测效果, 验证疫情数据影响甚微, 但天气数据对模型整体预测起到了正向作用. 将本文组合模型与单一模型TRMF, SARIMA, LightGBM模型在未来三个时间长度上比较, 组合模型在短期预测、中期预测、长期预测表现均优于三个单一模型, 具有良好的预测准确度, 可以为实际居民用水量预测提供参考. |
外文摘要: |
In recent years, with the increasing number of high-rise residents, the demand for water supply pressure has become increasingly high, and secondary water supply has gradually entered people's lives. Accurate prediction of residential water use can reduce the problems of water supply backlog, water source pollution, and insufficient water supply caused by secondary water supply. Therefore, this article takes the water use data of 20 residential buildings in a certain community in Shenzhen for four months as example, intended to establish appropriate models for predicting residential water consumption over a certain time span in the future. In terms of water use prediction, existing literature mainly uses time series models such as ARIMA, machine learning algorithms such as decision trees, and deep learning algorithms such as BP to establish models for prediction. Time series models are suitable for linear prediction, while machine learning and deep learning are mostly suitable for nonlinear prediction. However, there are missing values and outlier in real data, and there are many high-dimensional data with a long time span. Simple time series models are not suitable. At the same time, single prediction models have problems such as low accuracy and limited prediction range. Therefore, some literature also proposed the idea of combined models, for example, combining ARIMA and LSTM models for prediction, fitting the linear and nonlinear parts separately, combining them to achieve satisfactory results on one-dimensional time series data. However, when faced with high-dimensional data, simple time series models are no longer applicable. Therefore, this article uses the TRMF model instead of ARIMA to achieve linear prediction of high-dimensional data. For the residual part of linear prediction, LSTM is used to fill in the nonlinear part, achieving simultaneous prediction of multiple variables The TRMF model achieves dimensionality reduction, missing value filling and future prediction of high-dimensional data by combining the VAR with the MF. MF splits the high-dimensional sparse original data matrix into low-dimensional dense time series matrices, predicts the original variables through the VAR prediction time series matrix. However, due to the strict prerequisite requirements for VAR and the relatively simple and long-term memory ability of LSTM, this article uses LSTM to replace VAR to achieve the prediction of time series matrix and obtain the linear prediction results of TRMF Evaluate the predictive performance of the model using the Score, and the closer the Score is to 1, the better the predictive performance. Using TRMF for linear fitting, the batch size of LSTM is set to 168 for learning the time series matrix, which involves learning one week's data to predict the next time point. TRMF obtains Score of 0.815, 0.807, and 0.782 for the short term (48 hours), medium term (120 hours), and long term (336 hours) in the future. The proportion of insufficient water consumption due to low prediction is 12.3%, 13.6% and 14.4%. The proportion of water supply backlog caused by high prediction is 13.7%, 15.2% and 16.7%. Using LSTM for residual learning. Set the batch size to 168 to obtain residual predictions for the next three time periods. Combine the TRMF linear prediction part with the LSTM residual prediction part to obtain the predicted output. Score values 0.823, 0.811, and 0.787 are obtained for the next three time periods. The proportion of insufficient water consumption caused by low prediction is 9.1%, 10.7% and 13.1%. The proportion of water supply backlog caused by high prediction is 10.9%, 12.5% and 14.6%. After adding the LSTM model, reduced the phenomenon of insufficient water consumption and water supply backlog by about 3%. For the setting of batch size, compared with the combination model with a batch size of 336, it was found that the combination model in this article has certain advantages in mid-term and long-term prediction performance, except for slightly lower score values in short-term prediction. Setting it to 168 is more reasonable. By removing weather and epidemic data separately, the model's prediction effect was observed, and it was verified that the impact of epidemic data was minimal, but weather data played a positive role in the overall prediction of the model. Comparing the combined model in this paper with the single model TRMF, SARIMA and LightGBM in the next three time lengths, the combined model performs better than the three single models in short-term prediction, medium-term prediction and long-term prediction, which can provide reference for the prediction of actual residential water consumption. |
参考文献总数: | 30 |
馆藏地: | 总馆B301 |
馆藏号: | 硕025200/23041Z |
开放日期: | 2024-06-19 |