#### 期刊菜单

Research on Regularization Model in Complex Network Structure
DOI: 10.12677/AAM.2019.89178, PDF, 下载: 889  浏览: 1,268

Abstract: Based on the complex network structure under the Guassian model, the penalty term is considered as the seamless-L0 penalty, and a new regularization model is proposed. The Minorize-Maximization algorithm was used to construct the Minorization function of the proposed regularization model, and then the Maxdet algorithm was used to solve the model. The model simultaneously performs model selection and parameter estimation. And the result is sparse. Numerical simulation studies and examples of gene regulatory networks show that the model has good parameter estimation and model selection effects.

1. 引言

2. 正则化模型

$X=\left({X}^{\left(1\right)},\cdots ,{X}^{\left(p\right)}\right)$ 服从p维正态分布 ${N}_{p}\left(\mu ,\Sigma \right)$$\mu$ 为均值，  为协方差矩阵。设 $A={\left({a}_{ij}\right)}_{1\le i,j\le p}={\Sigma }^{-1}$ 为协方差矩阵的逆。基于随机变量 $X$ 样本观测值 ${X}_{1},{X}_{2},\cdots ,{X}_{n}$ 的对数似然函数是

$\frac{n}{2}\mathrm{log}|A|-\frac{1}{2}\underset{i=1}{\overset{n}{\sum }}{\left({X}_{i}-\mu \right)}^{\prime }A\left({X}_{i}-\mu \right)$ (1)

$C=\frac{1}{n}\underset{i=1}{\overset{n}{\sum }}{\left({X}_{i}-\stackrel{¯}{X}\right)}^{\prime }\left({X}_{i}-\stackrel{¯}{X}\right)$

${p}_{\text{SEL0}}\left({\beta }_{j}\right)=\frac{\lambda }{\mathrm{log}\left(2\right)}\left(\frac{|{\beta }_{j}|}{|{\beta }_{j}|+\tau }+1\right)$ (2)

$\mathrm{log}|A|-\frac{1}{n}\underset{i=1}{\overset{n}{\sum }}{\left({X}_{i}-\mu \right)}^{\prime }A\left({X}_{i}-\mu \right)$ st $\underset{i\ne j}{\sum }\frac{1}{\mathrm{log}\left(2\right)}\mathrm{log}\left(\frac{|{a}_{ij}|}{|{a}_{ij}|+\gamma }+1\right)\le \lambda$ (3)

$\mathrm{log}|A|-\frac{1}{n}\underset{i=1}{\overset{n}{\sum }}{{X}^{\prime }}_{i}A{X}_{i}$ st $\underset{i\ne j}{\sum }\frac{1}{\mathrm{log}\left(2\right)}\mathrm{log}\left(\frac{|{a}_{ij}|}{|{a}_{ij}|+\gamma }+1\right)\le \lambda$ (4)

$\mathrm{log}|A|-tr\left(A\stackrel{¯}{C}\right)$ st $\underset{i\ne j}{\sum }\frac{1}{\mathrm{log}\left(2\right)}\mathrm{log}\left(\frac{|{a}_{ij}|}{|{a}_{ij}|+\gamma }+1\right)\le \lambda$ (5)

$L\left(A\right)=\mathrm{log}|A|-tr\left(A\stackrel{¯}{C}\right)-\lambda \underset{i\ne j}{\sum }\frac{1}{\mathrm{log}\left(2\right)}\mathrm{log}\left(\frac{|{a}_{ij}|}{|{a}_{ij}|+\gamma }+1\right)$ (6)

3. Maxdet算法与参数选择

${c}_{1}.\text{\hspace{0.17em}}\text{\hspace{0.17em}}L\left(A\right)\ge H\left(a|{a}^{n}\right)$

${c}_{2}.\text{\hspace{0.17em}}\text{\hspace{0.17em}}L\left({A}^{n}\right)=H\left(a|{a}^{n}\right)$

$\frac{\lambda }{\mathrm{log}\left(2\right)}\underset{i\ne j}{\sum }\mathrm{log}\left(\frac{|{a}_{ij}^{n}|}{|{a}_{ij}^{n}|+\gamma }+1\right)-\frac{\lambda }{\mathrm{log}\left(2\right)}\underset{i\ne j}{\sum }\mathrm{log}\left(\frac{|{a}_{ij}|}{|{a}_{ij}|+\gamma }+1\right)\ge \frac{\lambda }{\mathrm{log}\left(2\right)}\underset{i\ne j}{\sum }\left(\frac{\frac{|{a}_{ij}|}{|{a}_{ij}|+\gamma }+1}{\frac{|{a}_{ij}^{n}|}{|{a}_{ij}^{n}|+\gamma }+1}\right)$

$\mathrm{log}|A|-tr\left(A\stackrel{¯}{C}\right)-\frac{\lambda }{\mathrm{log}\left(2\right)}\underset{i\ne j}{\sum }\mathrm{log}\left(\frac{|{a}_{ij}^{n}|}{|{a}_{ij}^{n}|+\gamma }+1\right)$

$L\left(A\right)\ge \mathrm{log}|A|-tr\left(A\stackrel{¯}{C}\right)-\frac{\lambda }{\mathrm{log}\left(2\right)}\left(\frac{|{a}_{ij}^{n}|}{|{a}_{ij}^{n}|+\gamma }+1\right)-\frac{\lambda }{\mathrm{log}\left(2\right)}\underset{i\ne j}{\sum }\left(\frac{\frac{|{a}_{ij}|}{|{a}_{ij}|+\gamma }+1}{\frac{|{a}_{ij}^{n}|}{|{a}_{ij}^{n}|+\gamma }+1}\right)$

$H\left(a|{a}^{n}\right)=\mathrm{log}|A|-tr\left(A\stackrel{¯}{C}\right)-\frac{\lambda }{\mathrm{log}\left(2\right)}\left(\frac{|{a}_{ij}^{n}|}{|{a}_{ij}^{n}|+\gamma }+1\right)-\frac{\lambda }{\mathrm{log}\left(2\right)}\underset{i\ne j}{\sum }\left(\frac{\frac{|{a}_{ij}|}{|{a}_{ij}|+\gamma }+1}{\frac{|{a}_{ij}^{n}|}{|{a}_{ij}^{n}|+\gamma }+1}\right)$ (7)

$L\left(A\right)\ge H\left(a|{a}^{n}\right)$ ，条件 ${c}_{1}$ 满足。

$H\left({a}^{n}|{a}^{n}\right)=\mathrm{log}|A|-tr\left(A\stackrel{¯}{C}\right)-\frac{\lambda }{\mathrm{log}\left(2\right)}\left(\frac{|{a}_{ij}^{n}|}{|{a}_{ij}^{n}|+\gamma }+1\right)-\frac{\lambda }{\mathrm{log}\left(2\right)}\underset{i\ne j}{\sum }\left(\frac{\frac{|{a}_{ij}{}^{n}|}{|{a}_{ij}{}^{n}|+\gamma }+1}{\frac{|{a}_{ij}^{n}|}{|{a}_{ij}^{n}|+\gamma }+1}\right)$

$H\left({a}^{n}|{a}^{n}\right)=\mathrm{log}|A|-tr\left(A\stackrel{¯}{C}\right)-\frac{\lambda }{\mathrm{log}\left(2\right)}\left(\frac{|{a}_{ij}^{n}|}{|{a}_{ij}^{n}|+\gamma }+1\right)$

$H\left({a}^{n}|{a}^{n}\right)=L\left(A\right)$ ，条件 ${c}_{2}$ 满足。

${\mathrm{min}}_{x\in {R}^{m}}{b}^{\prime }x-\mathrm{log}|G\left(x\right)|$

$\mathrm{min}2\underset{i

${\lambda }^{\prime }-2\underset{i (8)

$\text{BIC}\left(\lambda \right)=-\mathrm{log}|A\left(\lambda \right)|+tr\left\{A\left(\lambda \right)\stackrel{¯}{C}\right\}+\frac{\mathrm{log}n}{n}\underset{i\ne j}{\sum }{\stackrel{^}{e}}_{ij}\left(\lambda \right)$ 其中，如果 ${a}_{ij}=0$ ，则 ${e}_{ij}=0$ ；其余情 ${\stackrel{^}{e}}_{ij}=1$

4. 数值模拟

$\text{KL}=-\mathrm{log}|\stackrel{^}{A}|+tr\left(\stackrel{^}{A}\stackrel{^}{\Sigma }\right)-\left(-\mathrm{log}|{\Sigma }^{-1}|+p\right)$

FPR和TPR为：

$\text{FPR}=\text{FP}/\left(\text{FP}+\text{TN}\right)$$\text{TPR}=\text{TP}/\left(\text{TP}+\text{FN}\right)$ 。其中，TPR越大，FPR越小说明模拟效果越好。

Table 1. p = 15

Table 2. p = 30

Table 3. p = 50

Table 4. p = 100

5. 实例分析

1) 对邻接矩阵中基因之间的实际调控关系的个数进行统计，所得结果如下：

## IGRAPH UN-153 209

## attr: name (v/c)

2) 利用第一个数据对象中40行，153列的矩阵，根据本文提出的正则化模型，统计出在该模型下得到的生物学文献中已发现的实际调控关系及生物学文献中未被发现的调控关系的总量。所得结果如下：

Figure 1. Actual regulatory relationships among 153 genes

IGRAPH 68d9b0b U-153 554--

3) 最后统计在步骤2中利用本文提出的正则化模型得到的生物学文献中已发现的实际调控关系的个数。所得结果如下：

## IGRAPH UN-153 192--

6. 结论

 [1] Kolaczyk, E.D. (2009) Statistical Analysis of Network Data. Springer Science + Business Media, Berlin. https://doi.org/10.1007/978-0-387-88146-1 [2] Hand, D.J. (2010) Statistical Analysis of Network Data: Methods and Models by Eric D. Kolaczyk. International Statistical Review, 78, 134-159. https://doi.org/10.1111/j.1751-5823.2010.00109_2.x [3] Dempster, A.P. (1972) Covariance Selection. Bio-metrika, 32, 95-108. [4] Koller, D. and Friedman, N. (2012) Probabilistic Graphical Models: Principles and Tech-niques. The MIT Press, Cambridge. [5] 孙建举. 高斯图模型的估计[D]: [硕士学位论文]. 上海: 上海交通大学, 2014. [6] Drton, M. and Perlman, M.D. (2004) Model Selection for Gaussian Concentration Graphs. Biometrika, 91, 591-602. https://doi.org/10.1093/biomet/91.3.591 [7] Meinshausen, N. and Bühlmann, P. (2006) High-Dimensional Graphs and Variable Selection with the Lasso. The Annals of Statistics, 34, 1436-1462. https://doi.org/10.1214/009053606000000281 [8] Breiman, L. (1996) Heuristics of Instability and Stabilization in Model Selection. The Annals of Statistics, 24, 2350-2383. https://doi.org/10.1214/aos/1032181158 [9] Dicker, L., Huang, B. and Lin, X. (2013) Variable Selection and Estimation with the Seamles-L0 Penalty. Statistica Sinica, 23, 929-962. https://doi.org/10.5705/ss.2011.074 [10] Foster, D.P. and George, E.I. (1994) The Risk Inflation Crite-rion for Multiple Regression. The Annals of Statistics, 22, 1947-1975. https://doi.org/10.1214/aos/1176325766 [11] Yuan, M. and Lin, Y. (2007) Model Selection and Estimation in the Gaussian Graphical Model. Biometrika, 94, 19-35. https://doi.org/10.1093/biomet/asm018 [12] Lange, K., Hunter, D.R. and Yang, I. (2000) Optimization Transfer Using Surrogate Objective Functions. Journal of Computational and Graphical Statistics, 9, 60-77. https://doi.org/10.2307/1390605 [13] Friedman, J., Hastie, T. and Tibshirani, R. (2008) Sparse Inverse Covariance Estimation with the Graphical Lasso. Biostatistics, 9, 432-441. https://doi.org/10.1093/biostatistics/kxm045 [14] Ravikumar, P., Raskutti, G., Wainwright, M.J., et al. (2011) High-Dimensional Covariance Estimation by Minimizing L1-Penalized Log-Determinant. Electronic Journal of Statistics, 5, 935-980. https://doi.org/10.1214/11-EJS631 [15] Faith, J.J., Hayete, B., Thaden, J.T., et al. (2007) Large-Scale Mapping and Validation of Escherichia coil Transcriptional Regulation from a Compendium of Expression Profiles. PLoS Biology, 5, e8. https://doi.org/10.1371/journal.pbio.0050008