大数据中贝叶斯非参数方法的理论与应用研究
Research on the Theory and Application of Bayesian Nonparametric Methods in Big Data
摘要: 在人工智能高速发展的时代,对机器学习领域的探索占据重要的地位,而机器学习本质上源于对海量数据的分析与学习,这就离不开统计学中模型的建立与推断。贝叶斯方法作为统计学中主要且成熟的建模方法,在充分学习样本信息的前提下引入参数的先验信息,容纳了参数的不确定性,使模型推断更加合理。在贝叶斯框架下的非参数方法进一步扩大了这种不确定性,将参数的先验空间推广到分布空间,用随机过程来进行表示,此时的先验空间是无限维的。贝叶斯非参数建模方法以其巨大的灵活性和稳健性得到了广泛的关注,随着人工智能的迅速发展,研究人员在机器学习领域对贝叶斯非参数方法展开了深入的研究并取得了许多优异的成果。本篇论文探究了贝叶斯非参数的部分基础理论,并对其在大数据背景下的实际应用进行了研究与展望。
Abstract: In the era of rapid development of artificial intelligence, the exploration of the field of machine learning occupies an important position, and machine learning essentially stems from the analysis and learning of big data, which cannot be separated from the establishment and inference of models in statistics. Bayesian methods, as the main and well-established modelling methods in statistics, introduce a priori information about the parameters with sufficient learning of sample information, accommodating the uncertainty of the parameters and making model inference more reasonable. Nonparametric methods in the Bayesian framework further extend this uncertainty by extending the prior space of parameters to the distribution space, which is represented by a stochastic process, at which point the prior space is infinitely dimensional. Bayesian nonparametric modelling methods have received widespread attention for their great flexibility and robustness, and with the rapid development of artificial intelligence, researchers have conducted in-depth research on Bayesian nonparametric methods in the field of machine learning and achieved many excellent results. This paper explores some of the underlying theory of Bayesian nonparametric and investigates and prospects for its practical application in the context of big data.
文章引用:许蕊, 卢志义. 大数据中贝叶斯非参数方法的理论与应用研究[J]. 统计学与应用, 2023, 12(2): 283-292. https://doi.org/10.12677/SA.2023.122030

参考文献

[1] Pearl, J. (1986) Fusion, Propagation, and Structuring in Belief Networks. Artificial Intelligence, 29, 241-288.
[Google Scholar] [CrossRef
[2] Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.
[3] Reynolds, D.A. (2009) Gaussian Mixture Models. In: Li, S.Z. and Jain, A., Eds., Encyclopedia of Biometrics, Springer, Berlin, 659-663.
[Google Scholar] [CrossRef
[4] Eddy, S.R. (1996) Hidden Markov Models. Current Opinion in Structural Biology, 6, 361-365.
[Google Scholar] [CrossRef
[5] Ferguson, T.S. (1973) A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, 1, 209-230.
[Google Scholar] [CrossRef
[6] Ferguson, T.S. (1974) Prior Distributions on Spaces of Probability Measures. The Annals of Statistics, 2, 615-629.
[Google Scholar] [CrossRef
[7] Teh, Y.W. (2010) Dirichlet Process. In: Sammut, C. and Webb, G.I., Eds., Encyclopedia of Machine Learning, Springer, Berlin, 280-287.
[Google Scholar] [CrossRef
[8] Seeger, M. (2004) Gaussian Processes for Machine Learning. International Journal of Neural Systems, 14, 69-106.
[Google Scholar] [CrossRef
[9] Kingman, J.F.C. (1992) Poisson Processes. Vol. 3, Clarendon Press, Oxford.
[10] Hjort, N.L. (1990) Nonparametric Bayes Estimators Based on Beta Processes in Models for Life History Data. The Annals of Statistics, 18, 1259-1294.
[Google Scholar] [CrossRef
[11] Thibaux, R. and Jordan, M.I. (2007) Hierarchical Beta Processes and the Indian Buffet Process. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, Vol. 2, 564-571.
[12] Geyer, C.J. (1992) Practical Markov Chain Monte Carlo. Statistical Science, 7, 473-483.
[Google Scholar] [CrossRef
[13] Andrieu, C., De Freitas, N., Doucet, A. and Jordan, M.I. (2003) An Introduction to MCMC for Machine Learning. Machine Learning, 50, 5-43.
[Google Scholar] [CrossRef
[14] Casella, G. and George, E.I. (1992) Explaining the Gibbs Sampler. The American Statistician, 46, 167-174.
[Google Scholar] [CrossRef
[15] Blei, D.M., Kucukelbir, A. and McAuliffe, J.D. (2017) Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112, 859-877.
[Google Scholar] [CrossRef
[16] Teh, Y., Jordan, M., Beal, M. and Blei, D. (2004) Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes. Proceedings of the 17th International Conference on Neural Information Processing Systems, Vancouver, 1 December 2004, 1385-1392.
[17] Müller, P., Quintana, F.A., Jara, A. and Hanson, T. (2015) Bayesian Nonparametric Data Analysis. Vol. 1, Springer, New York.
[Google Scholar] [CrossRef
[18] Xuan, J., Lu, J. and Zhang, G. (2019) A Survey on Bayesian Nonparametric Learning. ACM Computing Surveys (CSUR), 52, 1-36.
[Google Scholar] [CrossRef
[19] Gershman, S.J. and Blei, D.M. (2012) A Tutorial on Bayesian Nonparametric Models. Journal of Mathematical Psychology, 56, 1-12.
[Google Scholar] [CrossRef
[20] Hjort, N.L., Holmes, C., Müller, P. and Walker, S.G. (2010) Bayesian Nonparametrics. Vol. 28, Cambridge University Press, Cambridge.
[Google Scholar] [CrossRef
[21] Müller, P. and Mitra, R. (2013) Bayesian Nonparametric Inference—Why and How. Bayesian Analysis, 8, 342 p.
[Google Scholar] [CrossRef] [PubMed]
[22] Orbanz, P. and Teh, Y.W. (2010) Bayesian Nonparametric Models. In: Sammut, C. and Webb, G.I., Eds., Encyclopedia of Machine Learning, Springer US, Boston, 81-89.
[Google Scholar] [CrossRef
[23] Halmos, P.R. (1944) Random Alms. The Annals of Mathematical Statistics, 15, 182-189.
[Google Scholar] [CrossRef
[24] Freedman, D.A. (1963) On the Asymptotic Behavior of Bayes’ Estimates in the Discrete Case. The Annals of Mathematical Statistics, 34, 1386-1403.
[Google Scholar] [CrossRef
[25] Kingman, J.F. (1975) Random Discrete Distributions. Journal of the Royal Statistical Society: Series B (Methodological), 37, 1-15.
[Google Scholar] [CrossRef
[26] Ishwaran, H. and James, L.F. (2001) Gibbs Sampling Methods for Stick-Breaking Priors. Journal of the American Statistical Association, 96, 161-173.
[Google Scholar] [CrossRef
[27] Sethuraman, J. (1994) A Constructive Definition of Dirichlet Priors. Statistica Sinica, 4, 639-650.
[28] Ishwaran, H. and James, L.F. (2003) Generalized Weighted Chinese Restaurant Processes for Species Sampling Mixture Models. Statistica Sinica, 13, 1211-1235.
[29] Pitman, J. (2006) Combinatorial Stochastic Processes: Ecole d’Eté de Probabilités de Saint-Flour XXXII-2002. Springer, Berlin.
[30] Smyth, P., Welling, M. and Asuncion, A. (2008) Asynchronous Distributed Learning of Topic Models. NIPS’08: Proceedings of the 21st International Conference on Neural Information Processing Systems, Vancouver, 8-11 December 2008, 81-88.
[31] Campbell, T., Straub, J., Fisher III, J.W. and How, J.P. (2015) Streaming, Distributed Variational Inference for Bayesian Nonparametrics. Proceedings of the 28th International Conference on Neural Information Processing Systems, Volume 1, 280-288.
[32] Neiswanger, W., Wang, C. and Xing, E. (2015) Embarrassingly Parallel Variational Inference in Nonconjugate Models.
[33] Fox, E.B. (2009) Bayesian Nonparametric Learning of Complex Dynamical Phenomena. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge.
[34] Fox, E., Sudderth, E., Jordan, M. and Willsky, A. (2008) Nonparametric Bayesian Learning of Switching Linear Dynamical Systems. Proceedings of the 21st International Conference on Neural Information Processing Systems, 8 December 2008, 457-464.
[35] Damlen, P., Wakefield, J. and Walker, S. (1999) Gibbs Sampling for Bayesian Non-Conjugate and Hierarchical Models by Using Auxiliary Variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61, 331-344.
[Google Scholar] [CrossRef
[36] Neal, R.M. (2003) Slice Sampling. The Annals of Statistics, 31, 705-767.
[Google Scholar] [CrossRef
[37] Kalli, M., Griffin, J.E. and Walker, S.G. (2011) Slice Sampling Mixture Models. Statistics and Computing, 21, 93-105.
[Google Scholar] [CrossRef
[38] Broderick, T., Mackey, L., Paisley, J. and Jordan, M.I. (2014) Combinatorial Clustering and the Beta Negative Binomial Process. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 290-306.
[Google Scholar] [CrossRef
[39] Blei, D.M. and Jordan, M.I. (2006) Variational Inference for Dirichlet Process Mixtures. Bayesian Analysis, 1, 121-143.
[Google Scholar] [CrossRef
[40] Kurihara, K., Welling, M. and Teh, Y.W. (2007) Collapsed Variational Dirichlet Process Mixture Models. Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 7, 2796-2801.
[41] Bryant, M. and Sudderth, E. (2012) Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes. Proceedings of the 25th International Conference on Neural Information Processing Systems, Volume 2, 2699-2707.
[42] Kurihara, K., Welling, M. and Vlassis, N. (2006) Accelerated Variational Dirichlet Process Mixtures. In: Schölkopf, B., Platt, J. and Hoffman, T., Eds., Advances in Neural Information Processing Systems, The MIT Press, Cambridge, 761-768.
[43] Lin, D. (2013) Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation. Proceedings of the 26th International Conference on Neural Information Processing Systems, Volume 1, 395-403.
[44] Hannah, L.A., Blei, D.M. and Powell, W.B. (2011) Dirichlet Process Mixtures of Generalized Linear Models. Journal of Machine Learning Research, 12, 1923-1953.
[45] Doshi-Velez, F., Pfau, D., Wood, F. and Roy, N. (2013) Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 394-407.
[Google Scholar] [CrossRef
[46] Gupta, S.K., Phung, D. and Venkatesh, S. (2012) A Bayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources. Proceedings of the 2012 SIAM International Conference on Data Mining, Anaheim, 26-28 April 2012, 200-211.
[Google Scholar] [CrossRef
[47] Canini, K.R., Shashkov, M.M. and Griffiths, T.L. (2010) Modeling Transfer Learning in Human Categorization with the Hierarchical Dirichlet Process. The 27th International Conference on Machine Learning (ICML 2010), Haifa, 21-24 June 2010, 151-158.
[48] Kang, J.H., Ma, J. and Liu, Y. (2012) Transfer Topic Modeling with Ease and Scalability. Proceedings of the 2012 SIAM International Conference on Data Mining, Anaheim, 26-28 April 2012, 564-575.
[Google Scholar] [CrossRef
[49] Elvira, C., Chainais, P. and Dobigeon, N. (2017) Bayesian Nonparametric Principal Component Analysis.
[50] Hill, J.L. (2011) Bayesian Nonparametric Modeling for Causal Inference. Journal of Computational and Graphical Statistics, 20, 217-240.
[Google Scholar] [CrossRef
[51] Jiang, Y. and Saxena, A. (2013) Infinite Latent Conditional Random Fields for Modeling Environments through Humans. Robotics: Science and Systems, Berlin, 24-28 June 2013, 1-8.
[Google Scholar] [CrossRef
[52] Plagemann, C., Kersting, K., Pfaff, P. and Burgard, W. (2007) Gaussian Beam Processes: A Nonparametric Bayesian Measurement Model for Range Finders. Robotics: Science and Systems (RSS’07), Atlanta, 27-30 June 2007.
[Google Scholar] [CrossRef
[53] Xing, E.P. and Sohn, K. (2007) Hidden Markov Dirichlet Process: Modeling Genetic Inference in Open Ancestral Space. Bayesian Analysis, 2, 501-527.
[Google Scholar] [CrossRef
[54] Xing, E.P., Sohn, K.A., Jordan, M.I. and Teh, Y.W. (2006) Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, 25-29 June 2006, 1049-1056.
[Google Scholar] [CrossRef
[55] Lijoi, A., Mena, R.H. and Prünster, I. (2007) A Bayesian Nonparametric Method for Prediction in EST Analysis. BMC Bioinformatics, 8, Article No. 339.
[Google Scholar] [CrossRef] [PubMed]
[56] Haines, T.S. and Xiang, T. (2013) Background Subtraction with Dirichlet Process Mixture Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 670-683.
[Google Scholar] [CrossRef
[57] Sudderth, E.B., Torralba, A., Freeman, W.T. and Willsky, A.S. (2008) Describing Visual Scenes Using Transformed Objects and Parts. International Journal of Computer Vision, 77, 291-330.
[Google Scholar] [CrossRef
[58] Fox, E.B., Sudderth, E.B., Jordan, M.I. and Willsky, A.S. (2008) An HDP-HMM for Systems with State Persistence. Proceedings of the 25th International Conference on Machine Learning, Helsinki, 5-9 July 2008, 312-319.
[Google Scholar] [CrossRef
[59] Goldwater, S., Griffiths, T.L. and Johnson, M. (2009) A Bayesian Framework for Word Segmentation: Exploring the Effects of Context. Cognition, 112, 21-54.
[Google Scholar] [CrossRef] [PubMed]