一种基于最小二乘解的高维单指标模型半监督估计方法
A Semi-Supervised Estimation Method for High-Dimensional Single-Index Models Based on the Least Squares Solution
摘要: 本研究在协变量为椭圆分布的假设下,建立了最小二乘解与单指标系数在方向上的相合性。基于此相合性,本研究直接使用最小二乘解构造了指标系数的半监督估计量,从而克服了估计联系函数这一挑战。具体地,本研究使用无标签数据中蕴含的协变量分布信息辅助精度矩阵估计。同时针对厚尾分布情形引入了一种截断技术,拓宽了本方法的适用面。本文在不同尾部情形下分别建立了方法的理论收敛速度,理论表明无标签数据在一定条件下能够提升高维单指标系数的估计效果,并在渐近意义下达到minimax最优的收敛速率。
Abstract: Under the assumption of elliptically distributed covariates, we establish the directional consistency between the least squares solution and the single-index coefficients. Based on this, we directly construct a semi-supervised estimator using the least squares solution. Specifically, the distributional information of covariates contained in the unlabeled data is utilized to assist in precision matrix estimation. Furthermore, for heavy-tailed distributions, a truncation technique is introduced, extending the applicability of the proposed method. Theoretical convergence rates are derived for scenarios with different tail properties, demonstrating that unlabeled data can enhance estimation performance. The results show that the method achieves the minimax optimal convergence rate asymptotically.
文章引用:史志恒, 崔文泉. 一种基于最小二乘解的高维单指标模型半监督估计方法[J]. 理论数学, 2025, 15(5): 311-324. https://doi.org/10.12677/PM.2025.155180

参考文献

[1] Cai, T.T. and Guo, Z. (2020) Semisupervised Inference for Explained Variance in High Dimen- sional Linear Regression and Its Applications. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82, 391-419.
https://doi.org/10.1111/rssb.12357
[2] Yang, Z., Balasubramanian, K. and Liu, H. (2017) High-Dimensional Non-Gaussian Single In- dex Models via Thresholded Score Function Estimation. International Conference on Machine Learning, 70, 3851-3860.
[3] Alquier, P. and Biau, G. (2013) Sparse Single-Index Model. Journal of Machine Learning Research, 14, 243-280.
[4] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009) Simultaneous Analysis of Lasso and Dantzig Selector. The Annals of Statistics, 37, 1705-1732.
https://doi.org/10.1214/08-aos620
[5] Ichimura, H. (1993) Semiparametric Least Squares (SLS) and Weighted SLS Estimation of Single-Index Models. Journal of Econometrics, 58, 71-120.
https://doi.org/10.1016/0304-4076(93)90114-k
[6] Deng, S., Ning, Y., Zhao, J., et al. (2024) Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning. Journal of the American Statistical Association, 119, 2748-2759.
https://doi.org/10.1080/01621459.2023.2277409
[7] Bellec, P.C., Dalalyan, A.S., Grappin, E. and Paris, Q. (2018) On the Prediction Loss of the Lasso in the Partially Labeled Setting. Electronic Journal of Statistics, 12, 3443-3472.
https://doi.org/10.1214/18-ejs1457
[8] Chakrabortty, A. and Cai, T. (2018) Efficient and Adaptive Linear Regression in Semi- Supervised Settings. The Annals of Statistics, 46, 1541-1572.
https://doi.org/10.1214/17-aos1594
[9] Chen, K. and Zhang, Y. (2023) Enhancing Efficiency and Robustness in High-Dimensional Linear Regression with Additional Unlabeled Data.
https://doi.org/10.48550/arXiv.2311.17685
[10] Yang, Z., Wang, Z., Liu, H., et al. (2015) Sparse Nonlinear Regression: Parameter Estimation and Asymptotic Inference.
https://doi.org/10.48550/arXiv.1511.04514
[11] Fan, J., Yang, Z. and Yu, M. (2022) Understanding Implicit Regularization in Over- Parameterized Single Index Model. Journal of the American Statistical Association, 118, 2315- 2328.
[12] Eftekhari, H., Banerjee, M. and Ritov, Y. (2021) Inference in High-Dimensional Single-Index Models under Symmetric Designs. Journal of Machine Learning Research, 22, 1-63.
[13] Neykov, M., Liu, J.S. and Cai, T. (2016) L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs. Journal of Machine Learning Research, 17, 1-37.
[14] Luo, S. and Ghosal, S. (2016) Forward Selection and Estimation in High Dimensional Single Index Models. Statistical Methodology, 33, 172-179.
https://doi.org/10.1016/j.stamet.2016.09.002
[15] Zhang, Y., Lian, H. and Yu, Y. (2020) Ultra-High Dimensional Single-Index Quantile Regres- sion. Journal of Machine Learning Research, 21, 1-25.
[16] Dong, C. and Tu, Y. (2024) Semiparametric Estimation and Variable Selection for Sparse Single Index Models in Increasing Dimension. Econometric Theory, 41, 617-659.
https://doi.org/10.1017/s0266466624000021
[17] Alquier, P. and Hebiri, M. (2012) Transductive Versions of the LASSO and the Dantzig Selec- tor. Journal of Statistical Planning and Inference, 142, 2485-2500.
https://doi.org/10.1016/j.jspi.2012.03.020
[18] Azriel, D., Brown, L.D., Sklar, M., Berk, R., Buja, A. and Zhao, L. (2021) Semi-Supervised Linear Regression. Journal of the American Statistical Association, 117, 2238-2251.
https://doi.org/10.1080/01621459.2021.1915320
[19] Foster, J.C., Taylor, J.M.G. and Nan, B. (2013)Variable Selection in Monotone Single-index Models via the Adaptive Lasso. Statistics in Medicine, 32, 3944-3954.
https://doi.org/10.1002/sim.5834
[20] Rossell, D. and Zwiernik, P. (2021) Dependence in Elliptical Partial Correlation Graphs. Elec- tronic Journal of Statistics, 15, 4236-4263.
https://doi.org/10.1214/21-ejs1891
[21] Wegkamp, M. and Zhao, Y. (2016) Adaptive Estimation of the Copula Correlation Matrix for Semiparametric Elliptical Copulas. Bernoulli, 22, 1184-1226.
https://doi.org/10.3150/14-bej690
[22] Tony Cai, T., Liu, W. and Xia, Y. (2013) Two-Sample Test of High Dimensional Means under Dependence. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76, 349-372.
https://doi.org/10.1111/rssb.12034
[23] Liu, W. and Luo, X. (2015) Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions. Journal of Multivariate Analysis, 135, 153-162.
https://doi.org/10.1016/j.jmva.2014.11.005
[24] Zhao, P., Yang, Y. and He, Q. (2022) High-Dimensional Linear Regression via Implicit Regu- larization. Biometrika, 109, 1033-1046.
https://doi.org/10.1093/biomet/asac010
[25] Li, K. and Duan, N. (1989) Regression Analysis under Link Violation. The Annals of Statistics, 17, 1009-1052.
https://doi.org/10.1214/aos/1176347254
[26] Zhao, T. and Liu, H. (2014) Calibrated Precision Matrix Estimation for High-Dimensional Elliptical Distributions. IEEE Transactions on Information Theory, 60, 7874-7887.
https://doi.org/10.1109/tit.2014.2360980