- 无标题文档
查看论文信息

中文题名:

 基于S线上交易平台的商品销量预测    

姓名:

 周光传    

保密级别:

 公开    

论文语种:

 中文    

学科代码:

 025200    

学科专业:

 应用统计    

学生类型:

 硕士    

学位:

 应用统计硕士    

学位类型:

 专业学位    

学位年度:

 2019    

校区:

 北京校区培养    

学院:

 统计学院/国民核算研究院    

第一导师姓名:

 周光传    

第一导师单位:

 北京师范大学统计学院    

提交日期:

 2019-06-19    

答辩日期:

 2019-05-11    

外文题名:

 PREDICTION OF COMMODITY SALES BASED ON S-ONLINE TRADING PLATFORM    

中文关键词:

 预测 ; 特征工程 ; 决策树 ; 随机森林 ; LightGBM    

中文摘要:
在大数据时代的背景下,每天的数据量呈井喷式的增加。其中的海量数据为企业提供了大量的信息。但是,随着企业的发展和数据的积累,如何从这些海量的数据中提取出最有效的信息是企业面临的一个挑战。本文研究的内容便是针对S线上交易平台的商户,对这些商户的营销数据进行分析和深度挖掘,从而为平台带来利润。 本文主要以R语言为主要的数据分析工具,基于源数据的特点和对变量的数据探索,选择了商店维度,地理维度和商品维度等数据,并通过运用多种机器学习的预测模型,对模型进行了参数的调节,数据构建了不同的特征,以达到较好的预测效果。 本文主要分为两大模块,第一模块主要是为了挑选出头部商户,即用户最喜爱的商户,针对商户的一系列营业数据,从多个角度综合定义了头部商户。第二模块是针对挑选出来的商户使用机器学习算法进行销量预测。首先对源数据中的缺失数据和异常值进行了预处理,其次通过对变量的可视化来发现源数据中因变量和解释变量的变化关系,再通过对数据构建特征工程,充分的挖掘出多个维度的特征。对于模型的构建,本文分别采用了决策树预测模型,随机森林回归预测模型,lightGBM预测模型和支持向量预测模型,并分别对各个模型中的参数深度挖掘,使得模型的预测效果达到最佳。实验结果发现,LightGBM预测模型在处理该问题上有比较理想的预测结果。
外文摘要:
In the context of the big data era, the daily data volume is blowout-like increase. Among them, massive data provide a lot of information for enterprises. However, with the development of enterprises and the accumulation of data, how to extract the most effective information from these massive data is a challenge faced by enterprises. The content of this paper is to analyze and deeply excavate the marketing data of the merchants on S-line trading platform, so as to bring profits to the platform. This paper mainly uses R language as the main data analysis tool. Based on the characteristics of source data and the data exploration of variables, we select store dimension, geographical dimension and commodity dimension data. By using a variety of machine learning prediction models, we adjust the parameters of the model and construct different characteristics of the data to achieve better prediction results. This paper is mainly divided into two modules. The first module is mainly to select the head merchant, that is, the user's favorite merchant. According to a series of business data of the merchant, the head merchant is defined comprehensively from various angles. The second module is to use machine learning algorithm to predict the sales volume of selected merchants. Firstly, the missing data and outliers in the source data are preprocessed. Secondly, through the visualization of variables, the relationship between dependent variables and explanatory variables in the source data is found. Secondly, through the data construction feature engineering, the characteristics of multiple dimensions are fully excavated. For the construction of the model, decision tree prediction model, Stochastic Forest regression prediction model, lightGBM prediction model and support vector prediction model are used respectively, and the parameters of each model are mined in depth, so that the prediction effect of the model is the best. The experimental results show that the LightGBM prediction model has better prediction results in dealing with this problem.
参考文献总数:

 23    

馆藏号:

 硕025200/19020    

开放日期:

 2020-07-09    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式