基于GEE与PGEE方法的西南地区城镇登记失业率分析
Analysis of Urban Registered Unemployment Rate in Southwest China Based on GEE and PGEE Methods
摘要: 西南地区以其庞大的人口基数和经济规模著称,然而经济结构转型引发的结构性失业问题日益凸显。失业问题不仅加重了民众经济负担,还影响了社会稳定和劳动力资源的有效配置。失业率作为评估区域经济健康的关键指标,对社会稳定、民众生活质量和经济政策规划具有直接影响。为科学预测区域失业率,本文基于1997年至2023年西南地区的面板数据,构建了Gamma回归边际模型,并选用对数函数作为连接函数,使用R软件采用了AR (1)自相关、独立相关以及可交换相关工作矩阵结构下的广义估计方程(GEE)以及惩罚广义估计方程(PGEE)这6种方法进行了实证分析。结果显示,当惩罚参数 λ 设定为0.13时,采用PGEE-AR (1)自相关结构方法构建的模型预测性能最好,它的测试集MSE、MAE、MAPE分别为0.207、0.32、8.85,都小于其它5中方法。这一研究为经济转型期的失业率预测提供了科学依据。
Abstract: Southwest China, renowned for its large population and economic scale, faces increasingly prominent structural unemployment issues triggered by economic restructuring. Unemployment not only exacerbates financial burdens on residents but also threatens social stability and the efficient allocation of labor resources. As a critical indicator of regional economic health, the unemployment rate directly impacts social stability, quality of life, and economic policy planning. To scientifically predict regional unemployment trends, this study utilizes panel data from Southwest China (1997~2023) to construct a Gamma regression marginal model with a logarithmic link function. Employing R software, six methodological approaches—Generalized Estimating Equations (GEE) and Penalized Generalized Estimating Equations (PGEE) under AR (1) autocorrelation, independent, and exchangeable working correlation matrix structures—were applied for empirical analysis. Results show that the model constructed using the PGEE-AR (1) method with autocorrelation structure, with a penalty parameter λ set to 0.13, achieved the best prediction performance. Its test set MSE, MAE, and MAPE are 0.207, 0.32, and 8.85%, respectively, all lower than those of the other five methods. This research provides a scientific basis for unemployment rate prediction during the economic transition period.
文章引用:陈思杨. 基于GEE与PGEE方法的西南地区城镇登记失业率分析[J]. 统计学与应用, 2025, 14(3): 225-236. https://doi.org/10.12677/sa.2025.143074

参考文献

[1] 高见, 周涛. 大数据揭示经济发展状况[J]. 电子科技大学学报, 2016, 45(4): 625-633.
[2] 丁守海, 冀承, 徐政. 中国自然失业率变化趋势分析[J]. 财经论丛, 2024, 40(9): 25-34.
[3] 王友乾, 付利亚, 徐建文. 纵向数据分析[M]. 北京: 高等教育出版社, 2015.
[4] Liang, K. and Zeger, S.L. (1986) Longitudinal Data Analysis Using Generalized Linear Models. Biometrika, 73, 13-22. [Google Scholar] [CrossRef
[5] Balan, R.M. and Schiopu-Kratina, I. (2005) Asymptotic Results with Generalized Estimating Equations for Longitudinal Data. The Annals of Statistics, 33, 533-541. [Google Scholar] [CrossRef
[6] Wang, L. (2011) GEE Analysis of Clustered Binary Data with Diverging Number of Covariates. The Annals of Statistics, 39, 389-417. [Google Scholar] [CrossRef
[7] Xie, M. and Yang, Y. (2003) Asymptotics for Generalized Estimating Equations with Large Cluster Sizes. The Annals of Statistics, 31, 310-347. [Google Scholar] [CrossRef
[8] 李润. 基于广义估计方程的医药上市公司财务危机预警模型[D]: [硕士学位论文]. 重庆: 西南大学, 2022.
[9] 赵延延, 李思冬, 王杨, 等. 医疗器械临床试验定量指标纵向数据中不同模型的比较研究[J]. 中国卫生统计, 2022, 39(1): 40-51.
[10] Wang, L., Zhou, J. and Qu, A. (2011) Penalized Generalized Estimating Equations for High‐Dimensional Longitudinal Data Analysis. Biometrics, 68, 353-360. [Google Scholar] [CrossRef] [PubMed]
[11] Fan, J. and Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360. [Google Scholar] [CrossRef
[12] 曹红艳, 曾平, 李治, 等. 惩罚广义估计方程在纵向数据基因关联分析中的应用[J]. 中国卫生统计, 2017, 34(4): 534-537.
[13] Cover, T. and Hart, P. (1967) Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13, 21-27. [Google Scholar] [CrossRef
[14] 郭艳卿, 李宇航, 王湾湾, 等. 基于联邦学习的Gamma回归算法[J]. 计算机科学, 2022, 49(12): 66-73.
[15] 仇春涓, 陈滔. 商业医疗保险损失分析: 基于广义线性模型的实证研究[J]. 应用概率统计, 2012, 28(4): 389-399.