李峰, 李芳. 中文词语语义相似度计算——基于《知网》2000 [J]. 中文信息学报, 2007, 21(3): 99-105.


  • 标题: 基于谱哈希的大规模网页分类算法Large Scale Web Page Classification Algorithm Based on Spectral Hashing

    作者: 田郸郸

    关键字: 网页分类, 大规模, 谱哈希, KNNWeb Page Classification, Large Scale, Spectrum Hashing, KNN

    期刊名称: 《Software Engineering and Applications》, Vol.5 No.1, 2016-02-25

    摘要: 如今,网络信息已经覆盖到我们生活的方方面面,但随着网络的发展,网络信息过载的问题也越来越凸显,我们在网络中难以准确定位我们所需要的信息。将网页分类可以有效的提高网页搜索效率,帮助我们准确的定位所需网页。当前的网页分类算法可以处理少量网页分类,但对大规模网页进行分类效率不够理想。最近人们提出了分布式的网页分类方法,但这种方法虽然能够提高网页分类效率,但并没有改进分类算法本身。所以本文提出一种基于哈希散列和KNN的方法,设计一个适用于大规模网页分类的分类算法。 Nowadays, network information has been covered in all aspects of our lives, but with the devel-opment of the network, the problem of network information overload has become more and more prominent so that it is difficult for us to accurately locate the information we need in the network. The web classification can effectively improve the efficiency of web search and help us accurately locate the desired page. The current classification algorithm can handle a small amount of web pages classified, but the efficiency of large-scale web classification is not ideal. Recently, a distributed web classification is proposed. Although this method can improve the efficiency of web page classification, it does not improve classification algorithm itself. Therefore, this paper proposes a hashes and KNN method based on the design of a classification algorithm applied to large-scale web classification.