# 采用PageRank算法探测生物过程中的临界点Identifying Critical Transition with PageRank Algorithm in a Biological Process

Abstract: With the belief that high-throughput datasets hold all the necessary information we want, a prob-lem of information retrieval confronts us. As PageRank algorithm achieves a great success in dealing with such a problem in the field of Internet, we adapt it for high-throughput datasets in combination with the theory of dynamical network biomarker, and try to identify a critical transi-tion in the biological processes. Our adapted PageRank algorithm successfully identifies the des-ignated critical points in data simulations and it also produces the same results with the earlier works when applied to experimental datasets.

1. 引言

PageRank是一个受到广泛研究的算法，在不同应用情景下有着多样的具体形式 [5]。基本地，PageRank算法就是求取如下迭代的平衡点：

$\pi ←\alpha {H}_{n}^{\text{T}}\pi +\alpha \left({d}^{\text{T}}\pi \right){v}_{n}+\left(1-\alpha \right){v}_{n},$ (1)

2. 方法

2.1. 从PCC矩阵提取邻接矩阵

$\sqrt{{r}^{2}\left(n-2\right)/\left(1-{r}^{2}\right)}~{t}_{n-2}$ (2)

${H}_{r}\left[i,j\right]=\left(|R\left[i,j\right]|-{T}_{c}\right)/\left(1-{T}_{c}\right)\ast {|R\left[i,j\right]|}^{n}$ (3)

2.2. 抑制网络中的背景结构

$\begin{array}{l}{H}_{m}\left[i,j\right]={H}_{r}\left[i,j\right]\left(1-S\left[i,j\right]\right)/\left(1+{\sum }_{i}S\left[i,j\right]\right)\\ {\pi }_{m}\left[i\right]={\pi }_{o}\left[i\right]\left(1+{\sum }_{i}S\left[i,j\right]\right)\end{array}$ (4)

2.3. 弥合悬挂节点的不连续性

2.4. DNB子网络中的PageRank值

$E=\left({P}_{G}+\left({P}_{I}-{P}_{o}\right)-\left({P}_{d}-{P}_{t}\right)\right)/\left(1-\alpha \right)$ (5)

${P}_{G}$ 是由个性化向量产生的PageRank值， ${P}_{I}$${P}_{o}$ 分别是通过边输入和输出的PageRank值， ${P}_{d}$${P}_{t}$ 分别是子网络中经由悬挂节点向全网逸散的PageRank值以及由全体悬挂节点逸散而来的PageRank值。对DNB子网络：由于内部紧密相关，所以 ${P}_{d}$ 是0；又由于DNB相对独立， ${P}_{I}$${P}_{o}$ 都小，理想情况下是0。 ${P}_{t}$ 实践中发现也不大，虽然 ${P}_{t}$ 使得E增大，对我们的算法有利。这样 $E\approx {P}_{G}/\left(1-\alpha \right)={\sum }_{i}^{\text{DNB}}{v}_{n}\left[i\right]$， 取遍DNB中所有节点。

2.5. 渐进PageRank方法

2.6. DNB评价指标

$I\stackrel{\text{def}}{=}{\sum }_{i\ne j}^{\text{DNB}}{H}_{r}\left[i,j\right]/\left(n\left(n-1\right)\right)$ (6)

3. 结果

3.1. 数值模拟

$x←\text{exp}\left(TD\left(\beta \right){T}^{-1}\right)x+\xi$ (7)

$x$ 是迭代变量，初值为全0； $\xi$ 是随机向量，各分量独立服从高斯分布，用于在迭代过程中引入随机性。在模拟数据集合中，我们的渐进PageRank算法成功定位了DNB并给出了临界预警信号。在图1中我们展示了DNB评价指标随主特征值 的变化；在绝大多数模拟数据中，DNB评价指标在 $\lambda$ 趋于0时趋近于1，成功地指示了临界点。

Figure 1. DNB indication for simulations (left: a synthesis result, right: specific curves)

3.2. 实验数据

Figure 2. DNB indication for the experiment of mouse lung injury

Figure 3. DNB indication for the experiment of HRG induced differentiation on MCF-7 human breast cancer cells

4. 讨论

