Proceedings of the 3rd Inter- national Conference on Similarity Search and Applications, Is- tanbul
Dimension reduce- tion for distance-based indexing
作者:
R. Mao, W. L. Miranker and D. P. Miranker.
关键词:
Similarity query ; Metric space ; Dimension reduction ; Intrinsic dimension ; Pivot selection ; Pivot space model
摘要:
Distance-based indexing exploits only the triangle inequality to answer similarity queries in metric spaces. Lacking of coordinate structure, mathematical tools in Rn can only be applied indirectly, making it difficult for theoretical study in metric space indexing. Toward solving this problem, we formalize a "pivot space model" where data is mapped from metric space to Rn, preserving all the pair wise distances under Linfin;. With this model, it can be shown that the indexing problem in metric space can be equivalently studied in Rn. Further, we show the necessity of dimension reduction for Rn and that the only effective form of dimension reduction is to select existing dimensions, i.e. pivot selection. The coordinate structure of Rn makes the application of many mathematical tools possible. In particular, Principle Component Analysis (PCA) is incorporated into a heuristic method for pivot selection and shown to be effective over a large range of workloads. We also show that PCA can be used to reliably measure the intrinsic dimension of a metric-space.
在线下载