中文题名: | 大规模DAG模型的FDR控制和功效研究 |
姓名: | |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 071400 |
学科专业: | |
学生类型: | 硕士 |
学位: | 理学硕士 |
学位类型: | |
学位年度: | 2024 |
校区: | |
学院: | |
研究方向: | 多重假设检验 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2024-06-11 |
答辩日期: | 2024-05-11 |
外文题名: | FDR CONTROL AND POWER STUDY OF LARGE-SCALE DAG MODELS |
中文关键词: | 多重假设检验 ; 有向无环图 ; 错误发现率 ; 树模型 ; 子集上的正回归相依性 |
外文关键词: | Multiple hypothesis testing ; Directed Acyclic Graph ; False Discovery Rate ; ; Tree ; Positive Regression Dependence on a Subset |
中文摘要: |
多重假设检验是处理大规模统计推断问题的一种有效方法, 随着大数据技术 的飞速发展, 多重假设检验在医学领域应用越来越广泛, 常用于研究疾病相关的 基因表达差异、药物疗效等问题. 另一方面, 图结构的假设检验需求越来越多, 导致传统的多重假设检验方法和理论很难完全适用. 基于有向无环图(Directed Acyclic Graph, DAG) 模型、树模型的多重假设检验理论引起学者关注, 一定程度 上解决了DAG 模型的检验问题. 虽然对于大规模DAG 模型的错误发现率(False Discovery Rate, FDR) 控制问题已有很多文献研究, 但是很多方法都忽略了特殊结 构性带来的拒绝集冗余问题, 关于冗余问题的研究较少. 因此, 研究大规模DAG 模型上多重假设检验拒绝集冗余问题和FDR 控制问题具有重要的理论意义和实 用价值. 本文主要研究大规模DAG 模型的FDR 控制问题, 受到Focused-BH 算法启 发, 本文分别针对DAG 模型和树模型提出了切实可行的两个算法. 通过引入外结 点筛选(Outer Node Filter, ONF) 和软外结点筛选(Soft Outer Node Filter, SONF) 两 种筛选方法, 分别应用于树模型和广义DAG 模型, 实现对冗余拒绝集的筛选. 同 时引入平滑 值的思想以提高算法功效. 当 值满足独立性或子集上的正回归相 依性(Positive Regression Dependence on a Subset, PRDS) 条件时, 证明了本文提出 的算法可以控制错误发现率(FDR). 本文所提出的方法在不同信号强弱程度和集散程度的结构下进行数值模拟, 其模拟结果显示该方法在各种设定下的检验结果在控制FDR 的前提下均比其余 多重假设检验方法更稳定、功效更高. 同时也进一步通过对不同环境中微生物的 丰度分析发现对生存环境有显著偏好的最小微生物群落, 以提供环境改变的参照 物. 其次对两种不同类型白血病患者基因表达数据进行差异分析以获得相关联的 生物过程. 基于上述两个实证数据分析验证了算法的有效性. |
外文摘要: |
Multiple hypothesis testing is an effective method for dealing with large-scale statistical inference problems. With the rapid development of big data technology, multiple hypothesis testing is becoming increasingly widespread in the field of medicine, commonly used in studying gene expression differences related to diseases, drug efficacy, and other issues. On the other hand, there is a growing demand for hypothesis testing with graph structures, which makes traditional multiple hypothesis testing methods and theories less applicable. The theory of multiple hypothesis testing based on Directed Acyclic Graph (DAG) and tree models has attracted the attention of scholars, to some extent solving the testing problem of DAG models. Although there have been many studies on the False Discovery Rate (FDR) control problem in large-scale DAG models, many methods ignore the redundancy of the rejection set caused by the special structure, and there is little research on the redundancy problem. Therefore, studying the redundancy problem of rejection sets and FDR control in large-scale DAG models is of great theoretical significance and practical value. This paper primarily investigates the FDR control problem in large-scale DAG models. Inspired by the Focused-BH algorithm, this paper presents two feasible algorithms tailored to DAG models and tree models, respectively. By introducing two filtering methods, the Outer Node Filter (ONF) and the Soft Outer Node Filter (SONF), which are applied to tree models and generalized DAG models, respectively, redundant rejection sets are filtered. Meanwhile, the concept of smoothed p-values is introduced to enhance algorithmic efficacy. It is proven that when the p-values satisfy independence or the PRDS condition, the algorithms proposed in this paper can control the error rate FDR. The method proposed in this article was numerically simulated under different levels of signal intensity and variability. The simulation results demonstrated that this method showed greater stability and higher power in various settings compared to other multiple hypothesis testing methods. Additionally, through analyzing the abundance of microorganisms in different environments, significant preferences of minimal microbial communities towards specific habitats were discovered, providing a reference for environmental changes. Furthermore, differential analysis of gene expression data from two different types of leukemia patients was conducted to identify associated biological processes. The effectiveness of the algorithm was validated based on the analysis of these two empirical datasets. |
参考文献总数: | 54 |
馆藏号: | 硕071400/24003 |
开放日期: | 2025-06-11 |