IEEE Conference on Applications of Digital Informa- tion and Web Technologies
Com- parison between document-based, term-based and hybrid parti- tioning
作者:
A. Abusukhon, M. P. Oakes, M. Talib and A. M. Abdalla.
关键词:
query processing; document handling; information retrieval systems
摘要:
Information retrieval (IR) systems for largescale data collections must build an index in order to provide efficient retrieval that meets the user’s needs. In distributed IR systems, query response time is affected by the way in which the data collection is partitioned across nodes. There are three types of collection partitioning; document-based partitioning (called the local index), term-based partitioning (called the global index) and hybrid partitioning. In this paper, we compare the three types of partitioning in terms of average query response time for a system with one broker and six other nodes. Our results showed that within our distributed IR system, the document-based and hybrid partitioning outperformed the term-based partitioning. However, unlike Xi et al. [14], we did not find that hybrid partitioning was any better than document-based partitioning in terms of average query response time.
在线下载