Proceedings of the 3rd Inter- national Conference on Similarity Search and Applications, Is- tanbul

Dimension reduce- tion for distance-based indexing

作者:
R. Mao W. L. Miranker and D. P. Miranker.

关键词:
Similarity query Metric space Dimension reduction Intrinsic dimension Pivot selection Pivot space model

摘要:
Distance-based indexing exploits only the triangle inequality to answer similarity queries in metric spaces. Lacking of coordinate structure, mathematical tools in Rn can only be applied indirectly, making it difficult for theoretical study in metric space indexing. Toward solving this problem, we formalize a "pivot space model" where data is mapped from metric space to Rn, preserving all the pair wise distances under Linfin;. With this model, it can be shown that the indexing problem in metric space can be equivalently studied in Rn. Further, we show the necessity of dimension reduction for Rn and that the only effective form of dimension reduction is to select existing dimensions, i.e. pivot selection. The coordinate structure of Rn makes the application of many mathematical tools possible. In particular, Principle Component Analysis (PCA) is incorporated into a heuristic method for pivot selection and shown to be effective over a large range of workloads. We also show that PCA can be used to reliably measure the intrinsic dimension of a metric-space.

在线下载

相关文章:
在线客服:
对外合作:
联系方式:400-6379-560
投诉建议:feedback@hanspub.org
客服号

人工客服,优惠资讯,稿件咨询
公众号

科技前沿与学术知识分享