SCAD-L2正则化下广义估计方程的性质分析
Analysis of the Properties of Generalized Estimating Equations under SCAD-L2 Regularization
摘要: 广义估计方程(Generalized estimating equations, GEE)因能够有效处理个体内相关性而在纵向数据分析中得到广泛应用。然而,当纵向数据的协变量高度相关时,传统变量选择方法往往面临变量选择不稳定的问题。本文将SCAD-L2正则化项融入GEE框架中,以实现变量选择与参数估计的双重优化。随后,本文提出一种适用于多重共线性纵向数据的初始值选择策略,即使用L2惩罚下的GEE估计值作为计算初始值。最后,本文研究了SCAD-L2惩罚下GEE估计的大样本渐近性质。模拟实验表明,该方法在纵向数据的参数估计与变量选择中显著优于现有方法,为复杂相关结构下的纵向数据建模提供了有效方法。
Abstract: Generalized estimating equations (GEE) have been widely used in longitudinal data analysis due to their ability to effectively account for within-subject correlation. However, when covariates in longitudinal data are highly correlated, traditional variable selection methods often suffer from instability. In this paper, we incorporate the SCAD-L2 regularization into the GEE framework to simultaneously optimize variable selection and parameter estimation. We then propose an initial value selection strategy for longitudinal data with multicollinearity, which uses the GEE estimator under an L2 penalty as the starting value for computation. Finally, we investigate the large-sample asymptotic properties of the SCAD-L2 penalized GEE estimator. Simulation studies show that the proposed method substantially outperforms existing approaches in both parameter estimation and variable selection, providing an effective tool for modeling longitudinal data with complex correlation structures.
文章引用:刘玥佳, 赵慧秀. SCAD-L2正则化下广义估计方程的性质分析[J]. 统计学与应用, 2026, 15(2): 58-72. https://doi.org/10.12677/sa.2026.152034

参考文献

[1] Liang, K. and Zeger, S.L. (1986) Longitudinal Data Analysis Using Generalized Linear Models. Biometrika, 73, 13-22. [Google Scholar] [CrossRef
[2] Zorn, C.J.W. (2001) Generalized Estimating Equation Models for Correlated Data: A Review with Applications. American Journal of Political Science, 45, 470-490. [Google Scholar] [CrossRef
[3] Wang, L. (2011) GEE Analysis of Clustered Binary Data with Diverging Number of Covariates. The Annals of Statistics, 39, 289-417. [Google Scholar] [CrossRef
[4] Xie, M. and Yang, Y. (2003) Asymptotics for Generalized Estimating Equations with Large Cluster Sizes. The Annals of Statistics, 31, 310-347. [Google Scholar] [CrossRef
[5] Guyon, I. and Elisseeff, A. (2003) An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3, 1157-1182.
[6] Wang, L., Zhou, J. and Qu, A. (2011) Penalized Generalized Estimating Equations for High‐Dimensional Longitudinal Data Analysis. Biometrics, 68, 353-360. [Google Scholar] [CrossRef] [PubMed]
[7] Desboulets, L.D.D. (2018) A Review on Variable Selection in Regression Analysis. Econometrics, 6, Article 45. [Google Scholar] [CrossRef
[8] Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58, 267-288. [Google Scholar] [CrossRef
[9] Fan, J. and Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360. [Google Scholar] [CrossRef
[10] Zou, H. (2006) The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101, 1418-1429. [Google Scholar] [CrossRef
[11] Fu, W.J. (2003) Penalized Estimating Equations. Biometrics, 59, 126-132. [Google Scholar] [CrossRef] [PubMed]
[12] Xu, P.R., Fu, W.J. and Zhu, L.X. (2013) Shrinkage Estimation Analysis of Correlated Binary Data with a Diverging Number of Parameters. Science China Mathematics, 56, 359-377. [Google Scholar] [CrossRef
[13] Fan, J. and Peng, H. (2004) Nonconcave Penalized Likelihood with a Diverging Number of Parameters. The Annals of Statistics, 32, 928-961. [Google Scholar] [CrossRef
[14] Wang, M., Song, L. and Wang, X. (2010) Bridge Estimation for Generalized Linear Models with a Diverging Number of Parameters. Statistics & Probability Letters, 80, 1584-1596. [Google Scholar] [CrossRef
[15] Zou, H. and Hastie, T. (2005) Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67, 301-320. [Google Scholar] [CrossRef
[16] Zeng, L. and Xie, J. (2014) Group Variable Selection via SCAD-L2. Statistics, 48, 49-66. [Google Scholar] [CrossRef
[17] Zou, H. and Zhang, H.H. (2009) On the Adaptive Elastic-Net with a Diverging Number of Parameters. The Annals of Statistics, 37, 1733-1751. [Google Scholar] [CrossRef] [PubMed]
[18] Blommaert, A., Hens, N. and Beutels, P. (2014) Data Mining for Longitudinal Data under Multicollinearity and Time Dependence Using Penalized Generalized Estimating Equations. Computational Statistics & Data Analysis, 71, 667-680. [Google Scholar] [CrossRef
[19] Lin, Y., Zhou, J., Kumar, S., Xie, W., G. Jensen, S.K., Haque, R., et al. (2020) Group Penalized Generalized Estimating Equation for Correlated Event-Related Potentials and Biomarker Selection. BMC Medical Research Methodology, 20, Article No. 221. [Google Scholar] [CrossRef] [PubMed]
[20] Hunter, D.R. and Li, R. (2005) Variable Selection Using MM Algorithms. The Annals of Statistics, 33, 1617-1642. [Google Scholar] [CrossRef] [PubMed]