AIRR  >> Vol. 5 No. 2 (May 2016)

    集成式位置敏感聚类方法
    Image Cluster Method Based on Ensemble Locality Sensitive Clustering

  • 全文下载: PDF(917KB) HTML   XML   PP.23-34   DOI: 10.12677/AIRR.2016.52003  
  • 下载量: 303  浏览量: 1,268   国家科技经费支持

作者:  

彭天强:河南工程学院,计算机工程与科学系,河南 郑州;
高毫林:信息工程大学,信息系统工程学院,河南 郑州

关键词:
精确欧式空间位置敏感哈希随机映射图像聚类聚类集成Exact Euclidean Locality Sensitive Hashing Random Projection Image Clustering Cluster Ensemble

摘要:

针对常用图像聚类尤其是图像视觉聚类方法聚类速度慢、不支持增量聚类的局限,提出了集成式位置敏感聚类方法。该方法首先根据聚类有效性指标估计合适的聚类数目,然后生成多重哈希函数,并用它们对各数据点进行映射得出多重桶标记,再对数据集各桶标记进行聚类得出多个基划分,最后对多个基划分进行集成得出最终划分。实验结果表明,在准确率方面,集成式位置敏感聚类在人工数据上与k-means结合聚类集成的方法相当,在图像集上与k-means结合聚类集成的方法接近。但集成式位置敏感聚类的优点在于其聚类时间快、适合于增量聚类等。因此,集成式位置敏感聚类方法可以用于解决高维图像特征聚类问题。

To overcome the weakness of k-means in image clustering especially visual image clustering, we proposed an Ensemble Locality Sensitive Clustering method. It first determined the number of clusters of dataset, then generated the multiple clustering resolutions based on Exact Euclidean Locality Sensitive Hashing algorithm, at last, cluster ensemble methods were applied to get final partition. The experiments on synthetic dataset and image dataset show that new method reaches the same level with k-means combined with cluster ensemble about clustering accuracy on synthetic data set, and slightly less accuracy on image dataset. But the advantage of new method is its clustering time is faster than k-means, and it is suitable for incremental clustering. Therefore, Ensemble Locality Sensitive Clustering is a promising clustering method for high dimension image data.

文章引用:
彭天强, 高毫林. 集成式位置敏感聚类方法[J]. 人工智能与机器人研究, 2016, 5(2): 23-34. http://dx.doi.org/10.12677/AIRR.2016.52003

参考文献

[1] Cao, Y. and Wu, J. (2002) Projective ART for Clustering Data Sets in High Dimensional Spaces. Neural Networks, 15, 105-120.
[2] Agrawal, R., Gehrke, J., Gunopulos, D. and Raghavan, P. (1998) Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. Proceedings of SIGMOD Record ACM Special Interest Group on Management of Data, 94-105. http://dx.doi.org/10.1145/276304.276314
[3] Schclara, A., Rokachb, L. and Amit, A. (2013) Ensembles of Classifiers Based on Dimensionality Reduction. Pattern Analysis and Applications Journal, 5, 1305-4345.
[4] Dasgupta, S. and Sinha, K. (2013) Randomized Partition Trees for Exact Nearest Neighbor Search. Proceedings of Workshop and Conference Proceedings, 30, 1-21.
[5] Baraniuk, R.G., Davenport, M., De Vore, R. and Wakin, M.B. (2008) A Simple Proof of the Restricted Isometry Principle for Random Matrices. Constructive Approximation, 28, 253-263.
[6] Fowler, J.E. and Du, Q. (2012) Anomaly Detection and Reconstruction from Random Projections. IEEE Transaction on Image Processing, 21, 184-195.
[7] Schulman, L.J. (2000) Clustering for Edge-Cost Minimization. Proceedings of Annual ACM Symposium Theory of Computing, 547-555.
[8] Balcan, M.-F., Blum, A. and Vempala, S. (2006) Kernels as Features: On Kernels, Margins, and Low-Dimensional Mappings. Machine Learning, 65, 79-94.
[9] Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A.J. and Vishwanathan, S.V.N. (2009) Hash Kernels for Structured Data. Journal of Machine Learning Research, 10, 2615-2637.
[10] Andoni and Indyk, P. (2008) Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. Communications of the ACM, 51, 117-122.
[11] Jegou, H., Douze, M. and Schmid, C. (2010) Improving Bag-of-Features for Large Scale Image Search. International Journal of Computer Vision, 87, 316-336. http://dx.doi.org/10.1007/s11263-009-0285-2
[12] Liu, Z., Liu, T. and David, G. (2010) Effective and Scalable Video Copy Detection. Proceedings of the ACM SIGMM International Conference on Multimedia Information Retrieval, ACM, New York, 119-128.
[13] Ravichandran, D., Pantel, P. and Hovy, E. (2005) Randomized Algorithms and NLP: Using Locality Sensitive Hash Function for High Speed Noun Clustering. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Stroudsburg, 622-629. http://dx.doi.org/10.3115/1219840.1219917
[14] Blum, A. (2006) Random Projection, Margins, Kernels, and Fea-ture-Selection. Proceedings of the 2005 International Conference on Subspace, Latent Structure and Feature Selection, LNCS, 52-68.
[15] Shi, Q.F., Shen, C.H., Hill, R. and van den Hengel, A. (2012) Is Margin Preserved after Random Projection. Proceedings of International Conference on Machine Learning, Edinburgh.
[16] Pons, S.V. and Sulcloper, J.R. (2011) A Surver of Clustering Ensemble Algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25, 337-372. http://dx.doi.org/10.1142/S0218001411008683
[17] Topchy, A.P., Jain, A.K. and Punch, W.F. (2005) Clustering Ensembles: Models of Consensus and Weak Partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1866-1881. http://dx.doi.org/10.1109/TPAMI.2005.237
[18] Topchy, A., Minaei-Bidgoli, B., Jain, A.K. and Punch, W.F. (2004) Adaptive Clustering Ensembles. Proceedings of the 17th International Conference, Washington DC, 272–275. http://dx.doi.org/10.1109/icpr.2004.1334105
[19] Strehl and Ghosh, J. (2002) Cluster Ensembles: A Knowledge Reuse Framework Multiple Partitions. Journal of Machine Learning Research, 583-617.