- 无标题文档
查看论文信息

中文题名:

 基于多元logistic 模型的异质性基因多效性检验    

姓名:

 文心怡    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 025200    

学科专业:

 应用统计    

学生类型:

 硕士    

学位:

 应用统计硕士    

学位类型:

 专业学位    

学位年度:

 2024    

校区:

 珠海校区培养    

学院:

 统计学院    

研究方向:

 数据科学与管理    

第一导师姓名:

 蒋庆    

第一导师单位:

 统计学院    

提交日期:

 2024-06-18    

答辩日期:

 2024-05-18    

外文题名:

 HETEROGENEITY TESTING OF GENETIC PLEIOTROPY BASED ON MULTINOMIAL LOGISTIC    

中文关键词:

 异质性基因多效性 ; logistic模型 ; EM算法 ; 似然比检验    

外文关键词:

 Heterogeneity Of Genetic Pleiotropy ; Logistic Model ; EM Algorith ; Likelihood Ratio Test    

中文摘要:

基因多效性是指基因的不同变异可以影响2个或2个以上不同的表型或特征,通过研究基因多效性,可以更好地了解基因的功能和调控机制,了解基因在不同性状上的表现,揭示基因与复杂性状之间的关联。本研究的目标是探讨针对异质性基因多效性的统计检测方法,提出了一种适用于二元性状基因多效性检验的方法。

本文采用多元logistic模型来描述基因与性状之间的关系,假定模型的误差服从正态分布。其中,X代表次等位基因的剂量,Y表示二元性状,其取值为0或1。接着,利用EM算法进行参数估计,包括对模型参数和隐变量的估计。在计算条件期望的过程中,使用了蒙特卡洛方法进行数值模拟估计,以解决难以计算的积分问题。最后,采用似然比检验方法对模型参数进行假设检验,从而确定基因是否存在多效性,并评估其显著性。针对上述的研究方法,进行数值模拟后发现,该方法具有较高的有效性,其经验水平empirical size的概率基本都低于0.05,检验功效高于0.75。在实例分析中,使用了725位抑郁症患者的数据,结果显示,基因rs11635365存在基因多效性,在使用0.05来决定何时停止序贯检验,基于这个阈值,本文发现在所选的5个性状(自杀、睡眠不深、全身症状、精神焦虑、早醒)中全身症状及精神焦虑是统计学关联的主要驱动因素,这为理解抑郁症的发病机制提供了重要线索。

外文摘要:

Gene pleiotropy refers to the phenomenon where different variants of a gene can affect two or more distinct phenotypes or traits. By studying gene pleiotropy, we can gain a better understanding of gene function and regulatory mechanisms, as well as uncover the relationships between genes and complex traits. The aim of this study is to explore statistical detection methods for heterogeneity in gene pleiotropy and propose an approach suitable for binary traits.

This study employs a multivariate logistic model to describe the relationship between genes and traits, assuming that the model's errors follow a normal distribution. Here, X represents the dosage of the minor allele, and Y represents a binary trait with values of 0 or 1. Subsequently, parameter estimation, including estimation of model parameters and latent variables, is conducted using the EM algorithm. In the process of calculating conditional expectations, Monte Carlo methods are employed for numerical simulation estimation to address challenging integration problems. Finally, likelihood ratio tests are used to perform hypothesis tests on model parameters, thereby determining whether there is polygenicity in the genes and evaluating its significance. Numerical simulations of the proposed research method reveal high effectiveness, with empirical size probabilities mostly below 0.05 and test statistic power exceeding 0.75. In the case study, data from 725 depression patients are utilized, revealing the presence of gene rs11635365 polygenicity. Using a significance level of 0.05 to determine when to stop sequential tests, it is found that systemic diseases and mental anxiety are the main driving factors statistically associated with five selected traits (suicide, shallow sleep, systemic diseases, mental anxiety, early awakening), providing important clues for understanding the pathogenesis of depression.

参考文献总数:

 28    

馆藏地:

 总馆B301    

馆藏号:

 硕025200/24056Z    

开放日期:

 2025-06-18    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式