摘要:
当今社会在生产与生活中产生的数据越来越多,要在海量的数据中搜索有用的信息,信息检索系统(IRS:Information Retrieval System,比如百度、谷歌等)是必不可少的工具。一个信息检索系统,特别是基于大规模数据集的信息检索系统,只有建立索引才能满足用户的检索需求,索引的好坏直接决定了信息检索系统的成败。数十年以来,对于信息检索系统中索引如何构建的研究一直没有中断,研究主要集中在对全局索引(Global Indexing)与局部索引(Local Indexing)及其混合类型(Hybrid Indexing)等结构的比较与探讨。本文详细介绍了几种索引的架构及其优缺点,回顾了相关的研究成果,分析了实际应用系统。最后,给出我们的观点与解决方案。
Abstract: Nowadays, there is more and more data generated in production and life. Information retrieval system (IRS, such as Baidu and Google) is an indispensable tool to search usable information from magnanimity information. For an IRS, especially based on large-scale data set, indexing is necessary. The index is good or bad directly determines the success or failure of the IRS. In the past decades, the research of indexing of IRS has been intensive. The research focus is comparison and discussion of global indexing, local indexing, hybrid indexing, etc. In this paper, these indexing are introduced, their advantages and disadvantages are discussed, and achievements of them are reviewed. Practical application system will be analyzed. Finally, our views and solutions will be given.