基于回归理论恢复基因调控网络
Restoration of Gene Regulatory Network Based on Regression Theory
DOI: 10.12677/AAM.2023.125222, PDF,    国家自然科学基金支持
作者: 张 雪, 严传魁*:温州大学数理学院,浙江 温州
关键词: 基因调控网络最小二乘估计假设检验Gene Regulatory Network Least Square Estimation Hypothesis Testing
摘要: 基因之间的调控关系隐含在基因表达数据里,需要分析该数据从而揭示基因调控网络的拓扑结构。由于静态基因表达数据的样本较少,因此本文提出基于距离相关性扩充样本数据量的方法。接着,本文提出恢复基因调控网络拓扑结构的方法,基于距离样本数据根据回归理论建立基因调控网络线性回归模型,对模型应用最小二乘估计和假设检验判断基因之间是否存在调控关系。此外,提出可以控制假阳性的方法,利用统计检验控制错误发现率提高模型预测的准确性。最后,在DREAM3数据集上验证方法的可行性。
Abstract: The regulatory relationship between genes is implicit in the gene expression data, which needs to be analyzed to reveal the topology of the gene regulatory network. Since the small sample size of static gene expression data, this paper proposes a method to expand the sample data size based on distance correlation. Then, this paper proposes a method to restore the topology of gene regulatory network. Based on the distance sample data, a linear regression model of gene regulatory network is established according to the regression theory. The least square estimation and hypothesis test-ing are applied to the model to determine whether there is a regulatory relationship between genes. In addition, a method to control false positives is proposed. Statistical test is used to control the false discovery rate to improve the accuracy of model prediction. Finally, the feasibility of the method is verified on the DREAM3 dataset.
文章引用:张雪, 严传魁. 基于回归理论恢复基因调控网络[J]. 应用数学进展, 2023, 12(5): 2177-2186. https://doi.org/10.12677/AAM.2023.125222

参考文献

[1] Bansal, A.K., et al. (2005) The Role of Reverse Engineering in the Development of Generic Formulations. Pharmaceuti-cal Technology, 29, 50-55.
[2] Kauffman, S. (1969) Homeostasis and Differentiation in Random Genetic Control Net-works. Nature, 224, 177-178. [Google Scholar] [CrossRef] [PubMed]
[3] Friedman, N. (2004) GraPhieal: Inferring Cellular Networks Using Proba-bilistie Models. Science, 303, 799-805. [Google Scholar] [CrossRef] [PubMed]
[4] Gennemark, P. and Wedelin, D. (2007) Efficient Algorithms for Or-dinary Differential Equation Model Identification of Biological Systems. IET Systems Biology, 1, 120-129. [Google Scholar] [CrossRef] [PubMed]
[5] Rubiolo, M., Milone, D.H. and Stegmayer, G. (2015) Mining Gene Regulatory Networks by Neural Modeling of Expression Time-Series. IEEE/ACM Transactions on Computational Biol-ogy & Bioinformatics, 12, 1365-1373. [Google Scholar] [CrossRef
[6] Luo, W., Hankenson, K.D. and Woolf, P.J. (2008) Learning Transcriptional Regulatory Networks from High Throughput Gene Expression Data Using Continuous Three-Way Mu-tual Information. BMC Bioinformatics, 9, 467. [Google Scholar] [CrossRef] [PubMed]
[7] Krämer, N., Schäfer, J. and Boulesteix, A.L. (2009) Regularized Es-timation of Large-Scale Gene Association Networks Using Graphical Gaussian Models. BMC Bioinformatics, 10, 384. [Google Scholar] [CrossRef] [PubMed]
[8] Fujita, A., Sato, J.R., Garay-Malpartida, H.M., et al. (2007) Mod-eling Gene Expression Regulatory Networks with the Sparse Vector Autoregressive Model. BMC Systems Biology, 1, 1-11. [Google Scholar] [CrossRef] [PubMed]
[9] Barrera, J., Jr, R., Jr, D., et al. (2004) A New Annotation Tool for Malaria Based on Inference of Probabilistic Genetic Networks. Critical Assessment of Microarray Data Analysis, 36-40.
[10] Székely, G.J., Rizzo, M.L. and Bakirov, N.K. (2007) Measuring and Testing Dependence by Correlation of Distances. Annals of Statistics, 35, 2769-2794. [Google Scholar] [CrossRef
[11] 罗霄. 基因调控网络构建方法研究[D] : [硕士学位论文]. 大连: 大连理工大学, 2020.