基于空间等价性的成分数据变换方法比较研究
A Comparative Study of Compositional Data Transformation Methods Based on Spatial Equivalence
DOI: 10.12677/SA.2018.72032, PDF,  被引量    国家自然科学基金支持
作者: 郭丽娟*:北京工商大学经济学院,北京;关蓉:中央财经大学统计与数学学院,北京
关键词: 成分数据单形空间欧氏空间正交变换Fisher判别分析Compositional Data Simplex Space Euclidean Space Orthogonal Transformation Fisher Discriminant Analysis
摘要: 单形空间的定和约束使得传统统计分析方法对成分数据失效,通常需要采用适当的变换方法将成分数据转化到欧氏空间后再进行统计分析。本文以非对称对数比变换、中心化对数比变化、等距对数比变换等三种常用的变换方法为研究对象,基于成分数据代数体系,从能否实现单形空间到欧氏空间等价转换的角度,比较研究了三种变换方法的合理性,为成分数据变换技术的选择提供理论依据。并选取岩石判别分类问题,分别采用以上方法对原始成分数据进行变换后建立判别模型,比较判别结果的可靠性。实证结果表明,等距对数比变换既克服了非对称对数比变换改变内积及距离等几何概念的缺陷,又避免了中心化对数比变换导致的多重共线性给多元分析方法带来的影响,在保持样本空间形态不发生变化的前提下解除了定和约束,是一种合理的变换方法。
Abstract: Traditional statistical analysis method in Euclidean space is not suitable for compositional data, due to its unit-sum constraint in Simplex space. A common solution is to firstly transform compositional data in Simplex space into data in Euclidean space and then perform statistical analysis on the transformed data. This paper proposes to compare three commonly used method, i.e., additive logratiotransformation (alr), centered logratio transformation (clr), and isometric logratio transformation (ilr). Based on Aitchison’s algebra, the comparison is carried out to examine whether a transformation method satisfies the properties of linearity and orthogonality. A real dataset, namely the rock data, is used to verify the comparison results. Three transformation methods are used to relax the unit-sum constraint of the rock data, respectively, and a discriminant model is then established on the transformed data. Comparison results from both theory and real-data studies indicate that isometric logratio transformation is superior to the other two transformation methods in two points. First, isometric logratio transformation does not change the geometry concepts, i.e., inner product and distance, which is inevitably caused by additive logratio transformation. Second, isometric logratio transformation successfully relaxes the unit-sum constraint and avoids multicolinearity, which cannot be solved by centered logratio transformation.
文章引用:郭丽娟, 关蓉. 基于空间等价性的成分数据变换方法比较研究[J]. 统计学与应用, 2018, 7(2): 271-279. https://doi.org/10.12677/SA.2018.72032

参考文献

[1] 洪冬, 韩晟, 管晓东, 等. 基于成分数据分析法的医院药品费用结构变化预测研究[J]. 中国新药杂志, 2015, 24(9): 965-971.
[2] Buccianti, A. and Pawlowsky-Glahn, V. (2005) New Perspectives on Water Chemistry and Compositional Data Analysis. Mathematical Geology, 37, 703-727. [Google Scholar] [CrossRef
[3] Jarautabragulat, E., Hervadasala, C., Egozcue, J.J., et al. (2015) Air Quality Index Revisited from a Compositional Point of View. Mathematical Geosciences, 48, 581-593. [Google Scholar] [CrossRef
[4] Snyder, R.D., Ord, K., Koehler, A.B., et al. (2015) Fore-casting Compositional Time Series: A State Space Approach. Monash Econometrics and Business Statistics Working Papers, Monash University.
[5] Billheimer, D., Guttorp, P. and Fagan, W.F. (1998) Statistical Analysis and In-terpretation of Discrete Compositional Data. National Center for Statistics and the Environment (NRCSE) Technical Report NRCSE-TRS.
[6] Aitchison, J. (1983) Principal Component Analysis of Compositional Data. Biomertrika, 70, 57-65. [Google Scholar] [CrossRef
[7] 张尧庭. 成分数据统计分析引论[M]. 北京: 科学出版社, 2000.
[8] 王惠文, 张志慧, Tenenhaus, M. 成分数据的多元回归建模方法研究[J]. 管理科学学报, 2006, 9(4): 27-32.
[9] 李春轩, 罗毅, 包安明, 等. 基于对数比转换的成分数据空间插值研究[J]. 中国农业科学, 2012, 45(4): 648-655.
[10] Wang, H., Shangguan, L.Y., Guan, R., et al. (2015) Principal Component Analysis for Compositional Data Vectors. Computational Statistics, 30, 1079-1096. [Google Scholar] [CrossRef
[11] Pawlowsky-Glahn, V., Egozcue, J.J. and Tolosana-Delgado, R. (2015) Modeling and Analysis of Compositional Data. John Wiley & Sons, Ltd.
[12] Kynclová, P., Filzmoser, P. and Hron, K. (2015) Modeling Compositional Time Series with Vector Autoregressive Models. Journal of Fore-casting, 34, 303-314. [Google Scholar] [CrossRef
[13] 郭丽娟, 王惠文, 关蓉. 基于等距logratio变换的成分数据判别分析方法[J]. 系统工程, 2016, 34(2): 153-158.
[14] Aitchison, J., Barceló-Vidal, C., Egozcue, J.J., et al. (2002) A Concise Guide to the Algebraic-Geometric Structure of the Simplex, the Sample Space for Composi-tional Data Analysis. Proceedings of IAMG, 2, 387-392.
[15] Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., et al. (2003) Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology, 35, 279-300. [Google Scholar] [CrossRef
[16] Aitchison, J. (1986) The Statistical Analysis of Compositional Data. Chapman and Hall, London. [Google Scholar] [CrossRef