基于错误发现率的高维数据流在线监控方法
Online Monitoring Method of High-Dimensional Data Streams Based on False Discovery Rate
摘要: 关于多数据流的监控,大多假设数据流之间是独立的。从统计过程控制的角度,给出了在线监控高维数据流的一般框架。鉴于数据的分布可能存在多样性,本文采用对称数据聚合方法建立了稳健的监控统计量,利用统计量的渐进对称性选取数据驱动的阈值,基于错误发现率对相关的非正态数据流进行在线监控。以AR (1)模型刻画数据流间的相关性,通过蒙特卡洛模拟,研究了所提出方法的错误发现率和功效水平。数值模拟结果表明所提出的方法具有较理想的性能。
Abstract: Regarding the monitoring of multiple data streams, it is mostly assumed that the data streams are independent. A general framework for online monitoring of high-dimensional data streams is provided from the perspective of statistical process control. Given the potential diversity in data distribution, this paper adopts a symmetric data aggregation method to establish a robust monitoring statistic. The asymptotic symmetry of the statistic is used to select data-driven thresholds, and the relevant non-normal data streams are monitored online based on the false discovery rate. The AR (1) model was used to characterize the correlation between data streams, and the false discovery rate and power level of the proposed method were studied through Monte Carlo. The numerical simulation results indicate that the proposed method has ideal performance.
文章引用:梁楠, 齐德全. 基于错误发现率的高维数据流在线监控方法[J]. 统计学与应用, 2024, 13(2): 307-314. https://doi.org/10.12677/sa.2024.132031

参考文献

[1] Bersimis, S., Psarakis, S. and Panaretos, J. (2007) Multivariate Statistical Process Control Charts: An Overview. Quality and Reliability Engineering International, 23, 517-543. [Google Scholar] [CrossRef
[2] Woodall, W.H. and Montgomery, D.C. (2014) Some Current Directions in the Theory and Application of Statistical Process Monitoring. Journal of Quality Technology, 46, 78-94. [Google Scholar] [CrossRef
[3] Noorossana, R., Saghaei, A. and Amiri, A. (2011) Statistical Analysis of Profile Monitoring. John Wiley & Sons, Inc., Hoboken. [Google Scholar] [CrossRef
[4] Wang, A., Wang, K. and Tsung, F. (2014) Statistical Surface Monitoring by Spatial-Structure Modeling. Journal of Quality Technology, 46, 359-376. [Google Scholar] [CrossRef
[5] Mei, Y. (2010) Efficient Scalable Schemes for Monitoring a Large Number of Data Streams. Biometrika, 97, 419-433. [Google Scholar] [CrossRef
[6] Spiegelhalter, D., Sherlaw-Johnson, C., Bardsley, M., Blunt, I., Wood, C. and Grigg, O. (2012) Statistical Methods for Healthcare Regulation: Rating, Screening and Surveillance (with Discussions). Journal of the Royal Statistical Society Series A, 175, 1-47. [Google Scholar] [CrossRef
[7] Zou, C., Wang, Z., Zi, X., et al. (2015) An Efficient Online Monitoring Method for High-Dimensional Data Streams. Technometrics, 57, 374-387. [Google Scholar] [CrossRef
[8] Kim, J., Abdella, G.M., Kim, S., et al. (2019) Control Charts for Variability Monitoring in High-Dimensional Processes. Computers & Industrial Engineering, 130, 309-316. [Google Scholar] [CrossRef
[9] Qi, D., Li, Z. and Wang, Z. (2016) On-Line Monitoring Data Quality of High-Dimensional Data Streams. Journal of Statistical Computation and Simulation, 86, 2204-2216. [Google Scholar] [CrossRef
[10] Shen, X., Zou, C., Jiang, W. and Tsung, F. (2013) Monitoring Poisson Count Data with Probability Control Limits When Sample Sizes Are Time Varying. Naval Research Logistics, 60, 625-636. [Google Scholar] [CrossRef
[11] Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society, Series B, 57, 289-300. [Google Scholar] [CrossRef
[12] Finner, H., Dickhaus, T. and Roters, M. (2007) Dependency and False Discovery Rate: Asymptotics. The Annals of Statistics, 35, 1432-1455. [Google Scholar] [CrossRef
[13] Fan, J. and Han, X. (2017) Estimation of the False Discovery Proportion with Unknown Dependence. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79, 1143-1164. [Google Scholar] [CrossRef] [PubMed]
[14] He, Y., Zhang, X., Wang, P., et al. (2017) High Dimensional Gaussian Copula Graphical Model with FDR Control. Computational Statistics & Data Analysis, 113, 457-474. [Google Scholar] [CrossRef
[15] Yuan, P., Kong, Y. and Li, G. (2023) FDR Control and Power Analysis for High-Dimensional Logistic Regression via StabKoff. Statistical Papers. [Google Scholar] [CrossRef
[16] Barras, L., Scaillet, O. and Wermers, R. (2010) False Discoveries in Mutual Fund Performance: Measuring Luck in Estimated Alphas. The Journal of Finance, 65, 179-216. [Google Scholar] [CrossRef
[17] Schwartzman, A., Dougherty, R.F. and Taylor, J.E. (2008) False Discovery Rate Analysis of Brain Diffusion Direction Maps. The Annals of Applied Statistics, 2, 153-175. [Google Scholar] [CrossRef
[18] Sun, W., Reich, B.J., Tony, C.T., et al. (2015) False Discovery Control in Large-Scale Spatial Multiple Testing. Journal of the Royal Statistical Society Series B: Statistical Methodology, 77, 59-83. [Google Scholar] [CrossRef] [PubMed]
[19] Du, L., Guo, X., Sun, W., et al. (2023) False Discovery Rate Control under General Dependence by Symmetrized Data Aggregation. Journal of the American Statistical Association, 118, 607-621. [Google Scholar] [CrossRef
[20] Wasserman, L. and Roeder, K. (2009) High Dimensional Variable Selection. Annals of Statistics, 37, 2178-2201. [Google Scholar] [CrossRef