中文题名: | 大数据抽样回归分析方法 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 071201 |
学科专业: | |
学生类型: | 学士 |
学位: | 理学学士 |
学位年度: | 2020 |
学校: | 北京师范大学 |
校区: | |
学院: | |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2020-06-11 |
答辩日期: | 2020-05-11 |
外文题名: | Subsampling methods for big data regression |
中文关键词: | |
外文关键词: | Subsampling methods ; Regression models D-optimal criterion ; A-optimal criterion ; L-optimal criterion |
中文摘要: |
此文系统总结并分析了现有文献中大数据下抽样应用于各种回归的方法。首先总结了线性模型下,基于杠杆值的抽样概率的抽样线性模型的四种经典方法以及IBOSS方法,并给出了计算MSE并从统计角度衡量各种方法的框架。已有文献结果表明,在基于杠杆值得抽样方法中,使用不加权最小二乘法(LEVUNW)有最好的性质。而IBOSS方法基于D-optimal的抽样方法,在一定条件下优于经典的四种抽样方法。其次,我们总结了基于A-optimal, L-optimal的logistic回归和分位数回归的抽样方法。
﹀
|
外文摘要: |
In the paper we systematically analyze subsampling methods for various types of regression model in the existing papers. First we proposed four algorithms and introduced IBOSS method. Then we provide a framework to evaluate statistical properties. We show that unweighted least-square model(LEVUNW) improved statistical properties, and it comes out that IBOSS based on D-optimal criterion is better than the four methods stated above. Thirdly, we extend subsampling methods to logistic and quantile regression models. For logistic models, asymptotic properties are proposed, and optimal sampling methods are given based on A-optimal and L-optimal criterion. Then we introduce more efficient logistic subsampling methods and a Poisson sampling procedure. Lastly, we focus on quantile regression models. We also propose the asymptotic properties and introduce optimal samping methods. The breakthrough is that we propose an iterative algorithm for statistical inferences. |
参考文献总数: | 10 |
作者简介: | 余涵,北京师范大学统计学院本科生 |
插图总数: | 10 |
插表总数: | 0 |
馆藏号: | 本071201/20056 |
开放日期: | 2021-06-11 |