基于机器学习试析孟德尔随机化研究中R2值预测模型
Analysis of the Prediction Model of R2 Value in Mendelian Randomization Study Based on Machine Learning
摘要: 孟德尔随机化研究在现代遗传学领域具有重要意义。它利用自然发生的基因突变作为工具,探究基因变异与生物特性之间的因果关系,从而克服了传统观察性研究中可能存在的混杂因素干扰,为生物特性机制的揭示提供了有力支持。然而与表型相关的研究数据中
R2值很难获取,国内外公共数据库中也常缺失。因此本文以我国生物信息中心(CNCB)数据库中甘蓝型油菜(oilseed rape)开花时间相关的基因数据为学习素材,通过采取多种机器学习算法,试对比不同模型预测
R2值的适用性。
Abstract:
The study of Mendelian randomization is of great significance in modern genetics. It uses naturally occurring gene mutation as a tool to explore the causal relationship between gene variation and traits, thus overcoming the possible confounding factors in traditional observational studies and providing strong support for the revelation of disease pathogenesis. However, R2 value is difficult to obtain in research data related to phenotype, and is often missing in public databases at home and abroad. In this paper, the genetic data related to the flowering time of oilseed rape in the CNCB database was used as learning materials, and various machine learning algorithms were adopted to compare the applicability of different models to predict R2 values.
参考文献
|
[1]
|
Burgess, S., Daniel, R.M., Butterworth, A.S., et al. (2014) Network Mendelian Randomization: Using Genetic Variants as Instrumental Variables to Investigate Mediation in Causal Pathways. International Journal of Epidemiology, 44, 484-495. [Google Scholar] [CrossRef] [PubMed]
|
|
[2]
|
Stephen, B. (2021) Mendelian Randomization: Methods for Causal Inference Using Genetic Variants. Taylor & Francis Group, Oxford.
|
|
[3]
|
Lipton, Z.C. (2018) The Mythos of Model Interpretability. ACM Queue: Architecting Tomorrows Computing, 16, 31-57. [Google Scholar] [CrossRef]
|