International Journal on Artificial Intelligence Tools

An assessment of a metric space database index to support sequence homology

作者:
R. Mao W. J. Xu N. Singh and D. P. Miranker

关键词:
biology computingdatabase management systemsmicroorganismsproteinstrees (mathematics)bi-directional bulk loadbottom-up clusteringexisting M-tree initialization algorithmsfixed length substringshierarchical bulk-load algorithm

摘要:
Hierarchical metric-pace clustering methods have been commonly used to organize proteomes into taxonomies. Consequently, it is often anticipated that hierarchical clustering can be leveraged as a basis for scalable database index structures capable of managing the hyper-exponential growth of sequence data. M-tree is one such data structure specialized for the management of large data set on disk.We explore the application of M-trees to the storage and retrieval of peptide sequence data. Exploiting a technique first suggested by Myers, we organize the database as records of fixed length substrings. Empirical results are promising.However,metric-space indexes are subject to "the curse of dimensionality" and the ultimate performance of an index is sensitive to the quality of the initial construction of the index. We introduce new hierarchical bulk-load algorithm that alternates between top-down and bottom-up clustering to initialize the index. Using the Yeast Proteomes, the bi-directional bulk load produces a more effective index than the existing M-tree initialization algorithms

在线下载

相关文章:
在线客服:
对外合作:
联系方式:400-6379-560
投诉建议:feedback@hanspub.org
客服号

人工客服,优惠资讯,稿件咨询
公众号

科技前沿与学术知识分享