不确定基因型的可加模型及变量选择
Additive Model and Variable Selection for Uncertain Genotypes
摘要: 全基因组关联分析(GWAS)是研究复杂疾病相关位点的有效方法.在基因不确定情形下,传统方法利用基因填补方式估计基因概率,继而展开后续基因关联分析。我们对大样本基因考虑一个非参数可加模型对可加分量维数大而非零加性分量数目小的基因数据进行建模,其中加性分量利用B样条基函数的线性组合工具来近似拟合基因概率对性状表征的效应关系;选择非零分量是利用组Lasso惩罚来获得初始估计量。最后我们利用蒙特卡洛模拟证明,可加模型的组lasso方法在基因表达样本中的效果良好。
Abstract: Genome-wide association analysis (GWAS) is an effective method to study the associated loci of complex diseases. In the case of genetic uncertainty, the traditional method uses the gene filling method to estimate the gene probability, and then carries out the subsequent gene association analysis. We used a nonparametric additive model to model the data of large samples of genes with large additive component dimensions but small non-zero-additive component numbers. The additive component was used as a linear combination tool of B-spline basis function to approximate the effect relationship of gene probability on trait characterization. The group Lasso penalty was used to obtain the initial estimator for selecting the non-zero component. Finally, Monte Carlo simulation was used to demonstrate that the group Lasso method of the additive model performed well in gene expression samples.
文章引用:钟思敏, 徐萍. 不确定基因型的可加模型及变量选择[J]. 统计学与应用, 2021, 10(2): 293-299. https://doi.org/10.12677/SA.2021.102029

参考文献

[1] Bush, W.S. and Moore, J.H. (2012) Chapter 11: Genome-Wide Association Studies. PLOS Computational Biology, 8, e1002822.
[Google Scholar] [CrossRef] [PubMed]
[2] 张学军. 复杂疾病的遗传学研究策略[J]. 安徽医科大学学报, 2007(3): 237-240.
[3] Klein, R.J., Zeiss, C., et al. (2005) Complement Factor H Polymorphism in Age-Related Macular Degeneration. Science, 308, 385-388.
[Google Scholar] [CrossRef] [PubMed]
[4] Sladek, R., Rocheleau, G., Rung, J., et al. (2007) A Genome-Wide Association Study Identifies Novel Risk Loci for Type 2 Diabetes. Nature, 445, 881-885.
[Google Scholar] [CrossRef] [PubMed]
[5] Tamiya, G., Shinya, M., et al. (2005) Whole Genome Association Study of Rheumatoid Arthritis Using 27039 Microsatellites. Human Molecular Genetics, 14, 2305-2321.
[Google Scholar] [CrossRef] [PubMed]
[6] Hu, N., Wang, C., Hu, Y., Yang, H.H., et al. (2005) Genome-Wide Association Study in Esophageal Cancer Using GeneChip Mapping 10K Array. Cancer Research, 65, 2542-2546.
[Google Scholar] [CrossRef
[7] Samani, N.J., Erdmann, J., Hall, A.S., et al. (2007) Genomewide Association Analysis of Coronary Artery Disease. New England Journal of Medicine, 357, 443-453.
[Google Scholar] [CrossRef
[8] Conti, D.V., Darst, B.F., Moss, L.C., et al. (2021) Trans-Ancestry Genome-Wide Association Meta-Analysis of Prostate Cancer Identifies New Susceptibility Loci and Informs Genetic Risk Prediction. Nature Genetics, 53, 65-75.
[Google Scholar] [CrossRef] [PubMed]
[9] International HapMap Consortium (2005) A Haplotype Map of the Human Genome. Nature, 437, 1299-1320.
[Google Scholar] [CrossRef] [PubMed]
[10] Lin, D., Hu, Y. and Huang, B. (2008) Simple and Efficient Analysis of Disease Association with Missing Genotype Data. The American Journal of Human Genetics, 82, 444-452.
[Google Scholar] [CrossRef] [PubMed]
[11] Howie, B.N., Donnelly, P. and Marchini, J. (2009) A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies. PLoS Genetics, 5, e1000529.
[Google Scholar] [CrossRef] [PubMed]
[12] Li, Y., Willer, C.J., Ding, J., Scheet, P. and Abecasis, G.R. (2010) MaCH: Using Sequence and Genotype Data to Estimate Haplotypes and Unobserved Genotypes. Genetic Epidemiology, 34, 816-834.
[Google Scholar] [CrossRef] [PubMed]
[13] Browning, B. and Browning, S. (2009) A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals. The American Journal of Human Genetics, 84, 210-223.
[Google Scholar] [CrossRef] [PubMed]
[14] Zheng, J., Li, Y., Abecasis, G.R. and Scheet, P. (2011) A Comparison of Approaches to Account for Uncertainty in Analysis of Imputed Genotypes. Genetic Epidemiology, 35, 102-110.
[Google Scholar] [CrossRef] [PubMed]
[15] Acar, E.F. and Sun, L. (2013) A Generalized Kruskal-Wallis Test Incorporating Group Uncertainty with Application to Genetic Association Studies. Biom, 69, 427-435.
[Google Scholar] [CrossRef] [PubMed]
[16] Ding, J. and Li, H. (2017) Comparison of Robust Tests for Genetic Association Analysis Incorporating Uncertain Genotype. Communications in Statistics—Simulation and Computation, 46, 3436-3443.
[17] 黄蕊. 二阶段关联分析在基因型不确定情形的应用[D]: [硕士学位论文]. 桂林: 广西师范大学, 2018.
[18] Zheng, G. and Chen, Z. (2005) Comparison of Maximum Statistics for Hypothesis Testing When a Nuisance Parameter Is Present Only under the Alternative. Biometrics, 61, 254-258.
[Google Scholar] [CrossRef
[19] 刘璐. 引入基因型线性模型的变量选择[D]: [硕士学位论文]. 桂林: 广西师范大学, 2019.
[20] Li, Q.Z., Xiong, W.J., Chen, J.B., Zheng, G., Li, Z.H., Mills, J.L. and Liu, A.Y. (2014) A Robust Test for Quantitative Trait Analysis with Model Uncertainty in Genetic Association Studies. Statistics and Its Interface, 7, 61-68.
[Google Scholar] [CrossRef
[21] Loley, C., König, I., Hothorn, L., et al. (2013) A Unifying Framework for Robust Association Testing, Estimation, and Genetic Model Selection Using the Generalized Linear Model. European Journal of Human Genetics, 21, 1442-1448.
[Google Scholar] [CrossRef] [PubMed]
[22] 成青. 高维基因数据中的变量选择[D]: [硕士学位论文]. 成都: 西南交通大学, 2014.
[23] Frank, I.E. and Friedman, J.H. (1993) A Statistical View of Some Chemometrics Regression Tools (with Discussion). Technometrics, 35, 109-148.
[Google Scholar] [CrossRef
[24] Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58, 267-288.
[Google Scholar] [CrossRef
[25] Fan, J. and Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360.
[Google Scholar] [CrossRef
[26] Huang, J., Horowitz, J.L. and Ma, S.G. (2008) Asymptotic Properties of Bridge Estimators in Sparse High-Dimen- sional Regression Models. The Annals of Statistics, 36, 587-613.
[Google Scholar] [CrossRef
[27] Stone, C.J. (1985) Additive Regression and Other Nonparametric Models. The Annals of Statistics, 13, 689-705.
[Google Scholar] [CrossRef
[28] Gordon, W.J. and Riesenfeld, R.F. (1974) B-Spline Curves and Surfaces. In: Computer Aided Geometric Design, Academic Press, Cambridge, 95-126.
[Google Scholar] [CrossRef
[29] Huang, J., Horowitz, J.L. and Wei, F. (2010) Variable Selection in Nonparametric Additive Models. The Annals of Statistics, 38, 2282-2313.
[Google Scholar] [CrossRef