基于逆回归方法的分布式特征筛选

doi:10.12677/aam.2025.141034

期刊菜单

基于逆回归方法的分布式特征筛选
Distributed Feature Screening via Inverse Regression

DOI: 10.12677/aam.2025.141034, PDF, HTML, XML,
作者: 张妍, 张俊英^*：太原理工大学数学学院，山西太原
关键词: 超高维；Gini相关系数；变量筛选；特征排序；Ultrahigh Dimension； Gini Correlation Coefficient； Variable Screening； Feature Ranking

摘要: 本文我们提出了一个通过逆回归估计实现大数据设置的分布式筛选框架。本着分而治之的思想，本文提出的框架可用分布估计条件方差的逆回归模型来表达相关关系。通过分量估计的聚合，我们得到了一个最终的逆条件方差估计，可以很容易地用于筛选特征。该框架支持分布式存储和并行计算，因此在计算上具有吸引力。由于分量参数的无偏分布估计，最终的聚合估计具有较高的精度，且对数据段数量m不敏感。在一般条件下，我们证明了聚合估计器在概率收敛界和均方误差率方面与集中估计器一样有效；相应的筛选过程对广泛的相关度量具有一定的筛选特性。

Abstract: In this paper, we propose a distributed screening framework for big data setup via inverse regression estimator. In the spirit of divide-and-conquer, the proposed framework expresses the dependent relative by inverse regression model in which can be distributively estimated inverse conditional variance. With the component estimates aggregated, we obtain a final inverse conditional variance estimator that can be readily used for screening features. This framework enables distributed storage and parallel computing and thus is computationally attractive. Due to the unbiased distributive estimation of the component parameters, the final aggregated estimate achieves a high accuracy that is insensitive to the number of data segments m. Under mild conditions, we show that the aggregated estimator is as efficient as the centralized estimator in terms of the probability convergence bound and the mean squared error rate; the corresponding screening procedure enjoys sure screening property for a wide range of correlation measures.

文章引用：张妍, 张俊英. 基于逆回归方法的分布式特征筛选[J]. 应用数学进展, 2025, 14(1): 344-351. https://doi.org/10.12677/aam.2025.141034

1. 引言

随着数据生成和采集技术的快速发展，在许多科学领域经常会遇到具有大量特征的海量数据。对于经典的统计方法，高维性同时对计算成本、统计准确性和算法稳定性提出了挑战[1] [2]。为了简化计算过程，一种自然的策略是在详细分析之前筛选出最不相关的特征。这个过程被称为特征筛选。随着维数从高降至低，分析难度大大降低。在文献中，这方面已经做了大量的工作；其中，基于相关性的筛选方法引起了学者们的广泛关注。这些方法基于特征与响应之间的某种关联度量进行筛选，弱相关性的特征被视为不相关的特征，将被删除。这种类型的方法可以在没有强模型假设(甚至无模型)下方便地实现。因此，它们通常用于分析具有复杂结构的高维数据，例如，Fan和Lv (2008)提出了基于皮尔逊相关的确定独立性筛选(SIS) [3]，随后他们又把SIS方法进一步推广到广义线性模型和非参数可加模型中[4] [5]。Zhu等(2011)提出了一种基于效用度量的确定独立排序和筛选(SIRS)，该度量与给定预测因子的响应的整个条件分布有关[6]。Li等(2012)提出了基于Kendall秩相关的鲁棒秩相关筛选(RRCS) [7]。Li等(2012)开发了一种基于距离相关的无模型确定独立筛选程序(DC-SIS) [8]。Wu和Yin (2015)提出了一种分布函数确定独立性筛选(DF-SIS)方法，该方法使用一种度量来检验两个变量之间的独立性[9]。Zhou等(2019)提出了一种鲁棒相关度量来筛选包含极值的特征[10]。

特征筛选[11]在许多应用中已被证明是一个有吸引力的策略。现有的方法大多是在特征数p较大，但样本量N为中等的情况下开发的。然而，在现代科学研究中，数据分析师不得不处理大数据集(其中p和N都是巨大的)的情况越来越普遍，例如，在现代全基因组基因研究[12]中，对数十万名参与者进行了数以百万计的SNP基因分型。在互联网研究中，一个杀毒软件每分钟可以扫描数百万个URL中的数万个关键字。当面对大p大N的数据时，由于存储瓶颈和算法可行性，直接采用经典筛选方法在数值上效率低下，例如，对于N = p = 10,000的数据集，众所周知的DC-SIS需要大约60小时才能在具有3.2 GHz CPU和32 GB内存的计算机上进行全面筛选。因此，在实践中开发方便计算的大数据筛选方法[13]-[15]是可取的。

当数据集太大而无法在一台计算机上处理时，考虑使用分而治之策略是很自然的。在该策略中，先将一个大问题分解为较小的可管理子问题，然后将相应的子输出组合得到最终输出。本着这种思想，许多机器学习和统计方法都已经被重建用于处理大数据(例如，Chen和Xie，2014 [16]；Xu et al.，2016 [17]；Jordan等人，2019 [18]；Banerjee et al.，2019 [19])。这些鼓舞人心的工作激励我们探索利用这种有前景的策略进行大数据特征筛选的可行性。

本文提出了一种基于聚合相关测度的分布式特征筛选框架，并将其称为聚合相关筛选(ACS)。在ACS中，我们将相关度量表示为几个组成参数的函数，每个组成参数都可以使用来自数据段的自然U统计量进行分布估计。将无偏分量估计结合在一起，我们得到了一个聚合的相关估计，可以很容易地用于特征筛选。在本文提出的ACS框架中，将海量数据集分成m个可管理的数据段进行处理，这些数据段可存储在多台计算机中，并通过并行计算完成相应的局部估计。因此，它为大p大N数据的特征筛选提供了一条计算上有吸引力的途径。此框架也适用于数据自然存储在不同位置的设置(例如，医院级别的医疗数据)。分量参数的U统计量估计作为一种有效、方便的降阶技术，保证了聚合相关估计量的高精度和相应筛选过程的可靠性。在一般条件下，我们证明了在概率收敛界和均方误差(MSE)率方面，聚合相关估计器与经典的集中估计器一样有效。这种全效率对m的选择不敏感，m的选择可以由问题本身指定，也可以由用户决定。对于广泛的相关度量，我们进一步证明了ACS在不需要指定参数模型(无模型)的情况下具有确定的筛选特性。

本文所提出的ACS植根于分量估计。在文献中，该思想已被用于分布式恢复由数据段中可分离的光滑估计方程定义的集中估计量(例如，Chen等人，2008 [20]；Lin and Xi，2011 [21])。不幸的是，这些工作不能直接适用于基于相关性的筛选，因为许多常用的相关度量的集中估计量通常不是由估计方程定义的，并且在数据段(例如，SIRS和DC)中不可分离。本文所提出的ACS来自于集中相关估计量的自然组合；它并不寻求完全恢复集中式估计器，而是提供一种有效且计算负担得起的替代方案。我们的研究结果为使用这种自然策略进行分布式特征筛选提供了理论支持。

2. 核估计中的筛选方法

2.1. 大数据特征筛选

设 $D = {(Y_{i}, X_{i})}_{i = 1}^{N}$ 是 ${Y, X}$ 的N个独立同分布的样本，其中Y是一个支持 $ϕ_{y}$ 的响应变量， $X = {(X_{1}, \dots, X_{p})}^{T}$ 是一个p维协变量向量。我们感兴趣的是p和N都很大的情况。当一个数据集是大量的且高维时，通常可以合理地假设只有少数的协变量(特征)与响应相关。设 $F (x_{j} | Y)$ 是Y给定条件下 $X_{j}$ 的条件分布函数。如果 $F (x_{j} | Y)$ 依赖于Y，则认为这个特征 $X_{j}$ 是相关的。我们使用M来表示相关特征的索引集，并定义 $M^{c} = {1, \dots, p} \ M$ 。特征筛选的目标是在详细分析之前，用 $j \in M^{c}$ 去除大多数不相关的特征 $X_{j}$ 。

具体来说，设 $w_{j} \geq 0$ 是Y和 $X_{j}$ 之间相关性强度的一个度量。设 ${\hat{w}}_{j}$ 是基于D的 $w_{j}$ 的集中估计。预先设定阈值 $r_{n} > 0$ ，则可以保留在

$\hat{M} = {j : {\hat{w}}_{j} > r_{n}, j = 1, 2, \dots, p} .$

中的特征并把其他的去除。当样本量N适中时，这种经典方法是有效的。然而，当p和N都很大时，基于完整数据集D计算 ${{\hat{w}}_{j}}_{j = 1}^{p}$ 在数值上可能是昂贵的。

2.2. 逆回归聚合筛选法

受近期分布式学习工作的启发，我们考虑采用分而治之的思想[22]来解决大数据特征筛选问题。不失一般性，假设初始完整数据集D被相等地划分为m个可管理的片段 ${D_{l}}_{l = 1}^{m}$ ，每个片段包含 $n = N / m$ 个观测值。根据计算环境的不同，这些数据段可以分布地存储在多台计算机上并进行处理，也可以由单台计算机按顺序进行处理。

我们考虑通过逆回归(Li, 1991) [23]基于核矩阵 $cov (E (X | Y))$ 进行特征筛选，不失一般性，假定 $E (X) = 0$ ， $E (Y) = 0$ ，令 $G_{j} (Y) = E (X_{j} | Y)$ ， $R = cov (E (X | Y)) = {(r_{i, j})}_{p \times p}$ ， $r_{i, j} = cov [E (X_{i} | Y), E (X_{j} | Y)]$ 。

设 $w_{j}^{l} = r_{j, j}^{l} = var (E (X_{j} | Y \in D_{l}))$ ， $w_{j} = \frac{1}{m} \sum_{l = 1}^{m} w_{j}^{l}$ ，对于 $j = 1, 2, \dots, p$ ， $w_{j}^{l} > 0$ ，若 $X_{j}$ 与Y独立，则有

$w_{j}^{l} = 0 (j = 1, 2, \dots, p, l = 1, 2, \dots, m)$ 。因此，重写活动集 $M = {j : w_{j} > 0, j = 1, 2, \dots, p}$ 。

现在我们通过核方法估计 $w_{j}$ 。令 $g (y)$ 表示Y的密度函数， $X_{i} = (X_{i 1}, X_{i 2}, \dots, X_{i p})$ 为 $X$ 的副本，定义

$G (Y) ≜ {(G_{1} (Y), \dots, G_{p} (Y))}^{T} = {(E (X_{1} | Y), \dots, E (X_{p} | Y))}^{T} ≜ E {(X | Y)}^{T},$

$h (Y) = {(h_{1} (Y), \dots, h_{p} (Y))}^{T} = {(G_{1} (Y) g (Y), \dots, G_{p} (Y) g (Y))}^{T} .$

用核平滑方法[1]估计 $w_{j}$ 如下：

${\hat{h}}_{j}^{l} (Y) = \frac{m}{h N} \sum_{{X_{i j}, Y_{i}} \in D_{l}} X_{i j} K ((Y - Y_{i}) / h),$

${\hat{g}}^{l} (Y) = \frac{m}{N h} \sum_{Y_{i} \in D_{l}} K ((Y - Y_{i}) / h) .$

${\hat{G}}_{j}^{l} (Y) = {\hat{h}}_{j}^{l} (Y) / {\hat{g}}^{l} (Y), {\hat{R}}^{l} = \frac{m}{N} \sum_{Y_{h} \in D_{l}} {\hat{G}}^{l} (Y_{h}) {\hat{G}}^{l} {(Y_{h})}^{T},$

${\hat{w}}_{j}^{l} ≜ {\hat{R}}_{j}^{l} = \frac{m}{N} \sum_{Y_{h} \in D_{l}} {({\hat{G}}_{j}^{l} (Y_{h}))}^{2}$

且

${\hat{w}}_{j} = \frac{1}{m} \sum_{l = 1}^{m} {\hat{w}}_{j}^{l},$ (1)

这里h表示带宽， $K (\cdot)$ 为核函数，对于带宽的选择按照plug-in方法(Ruppert et al., 1996 [24])。

本文所提出的分布式筛选采用了逆回归方法和核估计器。我们首先进行组件初始化 ${\hat{w}}_{j}^{l}$ ，然后将它们组合在一起，得到一个很容易用于特征筛选的聚合估计。我们将该方法命名为AIK (聚合逆回归方法和核估计)。

2.3. 理论分析

现在，我们提供一些使用AIK方法的理论证明。显然，AIK的筛选性能依赖于聚合相关估计 ${\hat{w}}_{j}$ 的准确性。我们表明， ${\hat{w}}_{j}$ 是一种有效的估计 $w_{j}$ 的工具；这是AIK的理论基础。我们的理论研究基于以下技术条件：

C1. 密度函数 $g (y)$ 在 $D_{l}, l = 1, 2, \dots, m$ 上有连续的二阶导数，且 $f^{'} (y), f^{″} (y)$ 具有有限上界，即对于某正数 $M_{1}, M_{2}$ ， $f^{'} (y) \leq M_{1}$ ， $f^{″} (y) \leq M_{2}$ [25] [26]。
C2. 核函数 $K (\cdot)$ 一致有界，且 $\int t^{2} K (t) d t < \infty$ 。
C3. 对于正的常数 $κ, c_{3}, γ > κ$ ， $\max_{{Y \in ϕ_{Y}}} {[E (X_{j} | Y)]}^{2} < c_{3} N^{- κ}$ 。

基于上述条件，我们推导了所提方法AIK的统一筛选框架。

定理2.1 在条件C1~C3下，假定带宽 $h = O (N^{- γ})$ ， $γ$ 在条件C3中定义，对于一些正常数 $C_{2}, c_{5}$ ，

$P (\max_{1 \leq j \leq p} | {\hat{w}}_{j} - w_{j} | \geq c_{3} N^{- κ}) \leq C_{2} \exp (- c_{5} N^{γ - κ}),$

且若取 $r_{n} = c_{3} N^{- κ}$ ，则

$P (M \subset \hat{M}) \geq 1 - C_{2} s_{n} N \exp (- c_{5} N^{γ - κ}),$

这里 $c_{5}$ 为正常数。

3. 定理证明

引理1. 对任意随机变量X，下列两个语句等价：

A. 存在 $H > 0$ 使得对于所有的 $| t | < H$ ，有 $E e^{t X} < \infty$ 。
B. 存在 $r > 0$ 使得 $E e^{s (X - E X)} < e^{r s^{2}}$ 。

引理2. 假定X是一个随机变量，对于某个 $a > 0$ 有 $E (e^{a | X |}) < \infty$ 。则对于任意 $M > 0$ ，存在正常数b和c使得

$P (| X | \geq M) \leq b e^{- c M} .$

引理3. (霍丁夫不等式[27] [28])设独立随机变量 $X_{i}, i = 1, \dots, n$ 对于一些 $a_{i}, b_{i}, i = 1, \dots, n$ 满足 $P (X_{i} \in [a_{i}, b_{i}]) = 1$ 。则对任意 $ε > 0$ ，有

$P (| X - E (X) | \geq ε) \leq 2 \exp (\frac{2 ε^{2} n^{2}}{\sum_{i = 1}^{n} {(b_{i} - a_{i})}^{2}}),$ (2)

这里 $X = (X_{1} + \dots + X_{n}) / n$ 。

引理4. 设 $a (u)$ 和 $b (u)$ 是u的两个一致有界函数，即存在 $M_{5} > 0$ ， $M_{6} > 0$ ，使得 $\sup_{u \in U} | a (u) | \leq M_{5}$ ， $\sup_{u \in U} | b (u) | \leq M_{6}$ 。对于给定的 $u \in U$ ， $\hat{A} (u)$ 和 $\hat{B} (u)$ 是基于大小为n的样本的 $a (u)$ 和 $b (u)$ 的估计。假定对任意小的 $ε \in (0, 1)$ ，存在正常数 $c_{1}, c_{2}$ 和s，使得[29]

$\sup_{u \in U} P (| \hat{A} (u) - a (u) | \geq ε) \leq c_{1} {(1 - \frac{ε s}{c_{1}})}^{n},$ (3)

$\sup_{u \in U} P (| \hat{B} (u) - b (u) | \geq ε) \leq c_{2} {(1 - \frac{ε s}{c_{2}})}^{n},$ (4)

则

$\sup_{u \in U} P (| {(\hat{A} (u))}^{2} - {(a (u))}^{2} | \geq ε) \leq C_{7} \exp (- \frac{ε}{C_{7} h}),$ (5)

$\sup_{u \in U} P (| \hat{A} (u) / \hat{B} (u) - a (u) / b (u) | \geq ε) \leq C_{9} \exp (- \frac{ε}{C_{9} h}),$ (6)

定理2.1的证明. 证明步骤如下：

步骤1：证明对任意的 $ε \in (0, 1)$ ， $1 \leq j \leq p$ ，可以找到一个正常数C使得

$P (| {\hat{h}}_{j}^{l} (Y) - h g^{l} (Y) m_{j}^{l} (Y) | \geq ε) \leq C \exp (- \frac{ε}{h}) .$

注意到

$\begin{array}{l} P (| {\hat{h}}_{j}^{l} (Y) - h g^{l} (Y) m_{j}^{l} (Y) | \geq ε) \\ = P (| {\hat{h}}_{j}^{l} (Y) - E [{\hat{h}}_{j}^{l} (Y)] + E [{\hat{h}}_{j}^{l} (Y)] - h g^{l} (Y) m_{j}^{l} (Y) | \geq ε) \\ \leq P ((| {\hat{h}}_{j}^{l} (Y) - E [{\hat{h}}_{j}^{l} (Y)] | + | E [{\hat{h}}_{j}^{l} (Y)] - h g^{l} (Y) m_{j}^{l} (Y) |) \geq ε) \end{array}$ (7)

$\begin{matrix} E [{\hat{h}}_{j}^{l} (Y)] - h g^{l} (Y) m_{j}^{l} (Y) = E (X_{i j} K (\frac{Y - Y_{i}}{h})) - h g^{l} (Y) m_{j}^{l} (Y) \\ = E (m_{j}^{l} (Y_{1}) K (\frac{Y - Y_{1}}{h})) - h g^{l} (Y) m_{j}^{l} (Y) ≜ Δ (Y, h) . \end{matrix}$

上述第一个等式是表示 $E (X_{i j} | Y_{i})$ 为 $m_{j}^{l} (Y_{1})$ 。

又有

$h^{- 1} Δ (Y, h) = \int {m_{j}^{l} (Y - t h) g^{l} (Y - t h) - m_{j} (Y) g^{l} (Y)} K (t) d t .$

因为 $K (t)$ 是对称的， $\int K (t) d t = 0$ 。此外，

$\begin{matrix} \lim_{h \to 0} h^{- 2} {m_{j}^{l} (Y - t h) g^{l} (Y - t h) - m_{j}^{l} (Y) g (Y) - [m_{j}^{l}^{'} (Y) g (Y) + m_{j}^{l} (Y) g^{l}^{'} (Y)] t h} \\ \to {m_{j}^{l}^{' '} (Y) g (Y) + 2 m_{j}^{l}^{'} (Y) g^{'} (Y) + m_{j} (Y) g^{l}^{' '} (Y)} t^{2} . \end{matrix}$ [26] (8)

因此通过控制收敛定理以及条件C1~C2， $m_{j}^{l}^{' '} (Y) g (Y) + 2 m_{j}^{l}^{'} (Y) g^{'} (Y) + m_{j} (Y) g^{l}^{' '} (Y)$ 一致有界，对于 $Y \in ϕ_{Y}$ 及一些常数 $C_{0}$ ， $h^{- 3} Δ (Y, h)$ 一致有界，也就是说，

$当 h \to 0 时, E [{\hat{h}}_{j}^{l} (Y)] - h g^{l} (Y) m_{j}^{l} (Y) \leq C_{0} h^{3}$

即取 $C_{0} h^{3} = ε / 2$ 时， $E [{\hat{h}}_{j}^{l} (Y)] - h g^{l} (Y) m_{j}^{l} (Y) \leq ε / 2$ 。因此，(7)式变为

$\begin{array}{l} P (| {\hat{h}}_{j}^{l} (Y) - h g^{l} (Y) m_{j}^{l} (Y) | \geq ε) \leq P (| {\hat{h}}_{j}^{l} (Y) - E [{\hat{h}}_{j}^{l} (Y)] | \geq ε / 2) \\ \leq P (| {\hat{h}}_{j}^{l} (Y) - E [{\hat{h}}_{j}^{l} (Y)] | \geq ε / 2, \max {| X_{i j} |} \leq T) + P (| {\hat{h}}_{j}^{l} (Y) - E [{\hat{h}}_{j}^{l} (Y)] | \geq ε / 2, \max {| X_{i j} |} \geq T) \\ \leq P (| {\hat{h}}_{j}^{l} (Y) - E [{\hat{h}}_{j}^{l} (Y)] | \geq ε / 2) + P (\max {| X_{i j} |} \geq T) = S_{1} + S_{2}, \end{array}$ (9)

其中 $S_{1} = P (| {\hat{h}}_{j}^{l} (Y) - E [{\hat{h}}_{j}^{l} (Y)] | \geq ε / 2)$ ， $S_{2} = P (\max {| X_{i j} |} \geq T)$ 。

根据引理3可知 $S_{1} \leq 2 \exp (\frac{- 2 ε^{2} N}{m T^{2}})$ ，再由引理2可知 $S_{2} \leq C_{1} N / m \exp (- c_{1} T)$ 。则

$\begin{array}{l} P (| {\hat{h}}_{j}^{l} (Y) - h g^{l} (Y) m_{j}^{l} (Y) | \geq ε) \leq 2 \exp (\frac{- 2 ε^{2} N}{m T^{2}}) + C_{1} N / m \exp (- c_{1} T) \\ = 2 \exp (- \frac{ε}{h} \cdot \frac{- 2 ε h N}{m T^{2}}) + C_{1} \exp (- \frac{ε}{h} \cdot c_{1} h \log (N / m) T / ε) . \end{array}$ (10)

又因为 $h = O ({(\frac{N}{m})}^{- γ})$ ， $T = O ({(N / m)}^{τ})$ ，这里 $γ < τ < (1 - γ) / 2$ ，对于大的N，

$\frac{2 ε h N}{m T^{2}} = C \cdot {(\frac{N}{m})}^{1 - γ - 2 τ} > 1, c_{1} h \log (N / m) T / ε = C \cdot {(\frac{N}{m})}^{τ - γ} \log (N / m) > 1$

因此，(10)式变为

$P (| {\hat{h}}_{j}^{l} (Y) - h g^{l} (Y) m_{j}^{l} (Y) | \geq ε) \leq C \exp (- \frac{ε}{h}) .$

相似地，我们也可以证得对于正常数 $C_{2}$ ，

$P (| {({\hat{G}}_{j}^{l} (Y_{i}))}^{2} - {(m_{j}^{l} (Y_{i}))}^{2} | \geq ε) \leq C_{2} \exp (- \frac{ε}{h}) .$

步骤2：证明对任意的 $ε \in (0, 1)$ ，推导出 $P (\max_{1 \leq j \leq p} | {\hat{w}}_{j} - w_{j} | > ε)$ 的上界。

注意到

$\begin{matrix} P (| {\hat{w}}_{j}^{l} - w_{j}^{l} | \geq ε) = P (| \frac{m}{N} \sum_{Y_{i} \in D_{l}} {(G_{j}^{l} (Y_{i}))}^{2} - \frac{m}{N} \sum_{Y_{i} \in D_{l}} {(m_{j}^{l} (Y_{i}))}^{2} | \geq ε) \\ = P (| \sum_{Y_{i} \in D_{l}} {(G_{j}^{l} (Y_{i}))}^{2} - \frac{m}{N} \sum_{Y_{i} \in D_{l}} {(m_{j}^{l} (Y_{i}))}^{2} | \geq \frac{N}{m} ε) \\ \leq \sum_{Y_{i} \in D_{l}} P (| {(G_{j}^{l} (Y_{i}))}^{2} - {(m_{j}^{l} (Y_{i}))}^{2} | \geq \frac{N}{m} ε) \\ \leq \frac{N}{m} \max_{Y_{i} \in D_{l}} P (| {(G_{j}^{l} (Y_{i}))}^{2} - {(m_{j}^{l} (Y_{i}))}^{2} | \geq \frac{N}{m} ε) \\ \leq \frac{N}{m} C_{2} \exp (- \frac{ε}{h}) . \end{matrix}$ (11)

利用同样的思想，我们可以证明 $P (| {\hat{w}}_{j} - w_{j} | \geq ε) \leq C_{2} N \exp (- \frac{ε}{h})$ 。又

$\begin{matrix} P (\max_{1 \leq j \leq p} | {\hat{w}}_{j} - w_{j} | \geq c_{3} N^{- κ}) \leq \sum_{j = 1}^{p} P (| {\hat{w}}_{j} - w_{j} | \geq c_{3} N^{- κ}) \\ \leq C_{2} N \exp (- c_{4} N^{- κ + γ}) \\ \leq C_{2} \exp (- c_{4} N^{- κ + γ} + 2 \log p) \end{matrix}$

$h = O (N^{- γ})$ ， $c_{4}$ 是一个正常数。取 $\log p = O (N^{γ - κ})$ ，即对正常数 $ξ$ ， $c_{4} - 2 ξ > 0$ ，有 $\log p = ξ N^{γ - κ}$ 。则上式变为

$\begin{matrix} P (\max_{1 \leq j \leq p} | {\hat{w}}_{j} - w_{j} | \geq c_{3} N^{- κ}) \leq C_{2} \exp (- (c_{4} - 2 ξ) N^{- κ + γ}) \\ = C_{2} \exp (- c_{5} N^{- κ + γ}), c_{5} = c_{4} - 2 ξ . \end{matrix}$

步骤3：取 $r_{n} = 2 c_{3} N^{- κ}$ ，证明 $P (M \subset \hat{M}) \geq 1 - C_{2} s_{n} N \exp (- c_{5} N^{γ - κ})$ 。

$\begin{matrix} P (M \subset \hat{M}) = P (\min_{j \in M} {\hat{w}}_{j} \geq c_{3} N^{- κ}) = P (\min_{j \in M} w_{j} - \min_{j \in M} {\hat{w}}_{j} \leq \min_{j \in M} w_{j} - c_{3} N^{- κ}) \\ \geq P (\min_{j \in M} w_{j} - \min_{j \in M} {\hat{w}}_{j} \leq 2 c_{3} N^{- κ} - c_{3} N^{- κ}) \geq P (\max_{j \in M} | w_{j} - {\hat{w}}_{j} | \leq c_{3} N^{- κ}) \\ \geq 1 - P (\max_{j \in M} | w_{j} - {\hat{w}}_{j} | \geq c_{3} N^{- κ}) \geq 1 - s_{n} P (| w_{j} - {\hat{w}}_{j} | \geq c_{3} N^{- κ}) \\ \geq 1 - C_{2} s_{n} N \exp (- c_{5} N^{γ - κ}) . \end{matrix}$

以上就完成了定理2.1的证明。

NOTES

^*通讯作者。

参考文献

[1]	Fan, J. and Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
[2]	Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58, 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
[3]	Fan, J. and Lv, J. (2008) Sure Independence Screening for Ultrahigh Dimensional Feature Space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70, 849-911. https://doi.org/10.1111/j.1467-9868.2008.00674.x
[4]	Fan, J. and Song, R. (2010) Sure Independence Screening in Generalized Linear Models with Np-Dimensionality. The Annals of Statistics, 38, 3567-3604. https://doi.org/10.1214/10-aos798
[5]	Fan, J., Feng, Y. and Song, R. (2011) Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models. Journal of the American Statistical Association, 106, 544-557. https://doi.org/10.1198/jasa.2011.tm09779
[6]	Zhu, L., Li, L., Li, R. and Zhu, L. (2011) Model-Free Feature Screening for Ultrahigh-Dimensional Data. Journal of the American Statistical Association, 106, 1464-1475. https://doi.org/10.1198/jasa.2011.tm10563
[7]	Li, R., Zhong, W. and Zhu, L. (2012) Feature Screening via Distance Correlation Learning. Journal of the American Statistical Association, 107, 1129-1139. https://doi.org/10.1080/01621459.2012.695654
[8]	Li, G., Peng, H., Zhang, J. and Zhu, L. (2012) Robust Rank Correlation Based Screening. The Annals of Statistics, 40, 1846-1877. https://doi.org/10.1214/12-aos1024
[9]	Wu, Y. and Yin, G. (2015) Conditional Quantile Screening in Ultrahigh-Dimensional Heterogeneous Data. Biometrika, 102, 65-76. https://doi.org/10.1093/biomet/asu068
[10]	Zhou, Y., Liu, J., Hao, Z. and Zhu, L. (2019) Model-Free Conditional Feature Screening with Exposure Variables. Statistics and Its Interface, 12, 239-251. https://doi.org/10.4310/sii.2019.v12.n2.a5
[11]	Wang, H. and Xia, Y. (2009) Shrinkage Estimation of the Varying Coefficient Model. Journal of the American Statistical Association, 104, 747-757. https://doi.org/10.1198/jasa.2009.0138
[12]	Fan, J. and Ren, Y. (2006) Statistical Analysis of DNA Microarray Data in Cancer Research. Clinical Cancer Research, 12, 4469-4473. https://doi.org/10.1158/1078-0432.ccr-06-1033
[13]	Fan, J., Ma, Y. and Dai, W. (2014) Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Varying Coefficient Models. Journal of the American Statistical Association, 109, 1270-1284. https://doi.org/10.1080/01621459.2013.879828
[14]	Hall, P. and Miller, H. (2009) Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems. Journal of Computational and Graphical Statistics, 18, 533-550. https://doi.org/10.1198/jcgs.2009.08041
[15]	Luo, S. and Chen, Z. (2014) Sequential Lasso Cum EBIC for Feature Selection with Ultra-High Dimensional Feature Space. Journal of the American Statistical Association, 109, 1229-1240. https://doi.org/10.1080/01621459.2013.877275
[16]	Chen, X. and Xie, M. (2014) A Split-and-Conquer Approach for Analysis of. Statistica Sinica, 24, 1655-1684. https://doi.org/10.5705/ss.2013.088
[17]	Xu, C., Zhang, Y., Li, R. and Wu, X. (2016) On the Feasibility of Distributed Kernel Regression for Big Data. IEEE Transactions on Knowledge and Data Engineering, 28, 3041-3052. https://doi.org/10.1109/tkde.2016.2594060
[18]	Jordan, M.I., Lee, J.D. and Yang, Y. (2018) Communication-Efficient Distributed Statistical Inference. Journal of the American Statistical Association, 114, 668-681. https://doi.org/10.1080/01621459.2018.1429274
[19]	Gonçalves, A.R., Liu, X. and Banerjee, A. (2019) Two-Block vs. Multi-Block ADMM: An Empirical Evaluation of Convergence. arXiv: 1907.04524.
[20]	Chen, J. and Chen, Z. (2008) Extended Bayesian Information Criteria for Model Selection with Large Model Spaces. Biometrika, 95, 759-771. https://doi.org/10.1093/biomet/asn034
[21]	Lin, N. and Xi, R. (2011) Aggregated Estimating Equation Estimation. Statistics and Its Interface, 4, 73-83. https://doi.org/10.4310/sii.2011.v4.n1.a8
[22]	Wang, H. (2012) Factor Profiled Sure Independence Screening. Biometrika, 99, 15-28. https://doi.org/10.1093/biomet/asr074
[23]	Li, K. (1991) Sliced Inverse Regression for Dimension Reduction. Journal of the American Statistical Association, 86, 316-327. https://doi.org/10.2307/2290563
[24]	Ruppert, D., Sheather, S.J. and Wand, M.P. (1995) An Effective Bandwidth Selector for Local Least Squares Regression. Journal of the American Statistical Association, 90, 1257-1270. https://doi.org/10.1080/01621459.1995.10476630
[25]	Zhang, J., Zhang, R. and Lu, Z. (2015) Quantile-Adaptive Variable Screening in Ultra-High Dimensional Varying Coefficient Models. Journal of Applied Statistics, 43, 643-654. https://doi.org/10.1080/02664763.2015.1072141
[26]	Zhang, J., Zhang, R. and Zhang, J. (2017) Feature Screening for Nonparametric and Semiparametric Models with Ultrahigh-Dimensional Covariates. Journal of Systems Science and Complexity, 31, 1350-1361. https://doi.org/10.1007/s11424-017-6310-6
[27]	Hoeffding, W. (1948) A Class of Statistics with Asymptotically Normal Distribution. The Annals of Mathematical Statistics, 19, 293-325. https://doi.org/10.1214/aoms/1177730196
[28]	Wu, X. and Zhang, J. (2017) Researches on Rademacher Complexities in Statistical Learning Theory: A Survey. Acta Automatica Sinica, 43, 20-39.
[29]	Schechtman, E. and Yitzhaki, S. (1999) On the Proper Bounds of the Gini Correlation. Economics Letters, 63, 133-138. https://doi.org/10.1016/s0165-1765(99)00033-6

为你推荐

友情链接