基于稀疏约束和对偶图正则化的受限概念分解算法及在数据表示中的应用
Constrained Concept Factorization Based on Sparseness Constraints and Dual Graph Regularization for Data Representation
DOI: 10.12677/CSA.2022.124106, PDF,    国家自然科学基金支持
作者: 翁宗慧, 由从哲:江苏理工学院,计算机工程学院,江苏 常州
关键词: 概念分解标签信息对偶图正则化LP平滑范数Concept Decomposition Algorithm Label Information Dual Graph Regularized LP Smoothness Constraint
摘要: 概念分解算法(CF)是一种经典的数据表达方式,已经被广泛使用于机器视觉、模式识别等领域。基本的CF方法是一种无监督的学习算法,无法利用数据中存在的先验知识,没有考虑数据空间流形和特征空间流形的几何结构信息,同时分解结果也不具有稀疏性。为了解决以上缺陷,本文提出了一种基于稀疏约束和对偶图正则化的受限概念分解算法(DCCFS)。该算法通过保持样本数据空间和特征空间中内蕴的几何结构信息不变,使得算法可以更加有效提取数据的特征,增强了算法的数据表达能力;利用数据中天然存在的类别性息,增强算法的鉴别能力;添加LP平滑范数提高了算法的稀疏性,使得分解结果更加准确、平滑。在COIL20图像数据集、PIE人脸数据集以及TDT2文本数据集上的聚类实验证明本文提出的DCCFS的聚类性能优于其他同类算法。
Abstract: Concept decomposition algorithm (CF) is a classical data representation that has been widely used in machine vision, pattern recognition and other fields. In response to the fact that the basic CF method is an unsupervised learning algorithm that does not consider the geometric structure information and the class information of the samples present in the data space and feature space, and also does not take into account the sparsity of the decomposition results, this paper proposes a novel method named constrained concept factorization based on sparseness constraints and dual graph regularization for data representation (DCCFS) to overcome the above defects. This method constructs the geometric structure information in the sample data space and feature space unchanged, which extracts the features of the data more effectively and enhances the data expression ability of the algorithm; by using the natural label information in the data to enhance the identification ability of the algorithm; DCCFS adds the smooth sparse constraint to make the matrix factorization process more stable, smooth, which makes sure that the results are more accurate. The experimental results on COIL20 image dataset, PIE face dataset and TDT2 text dataset show that the DCCFS method can provide better representation for high-dimensional data and effectively improve the clustering performance.
文章引用:翁宗慧, 由从哲. 基于稀疏约束和对偶图正则化的受限概念分解算法及在数据表示中的应用[J]. 计算机科学与应用, 2022, 12(4): 1031-1042. https://doi.org/10.12677/CSA.2022.124106

参考文献

[1] Wang, Q., He, X., Jiang, X., et al. (2020) Robust Bi-Stochastic Graph Regularized Matrix Factorization for Data Clus-tering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 390-403. [Google Scholar] [CrossRef
[2] Peng, C., Zhang, Z., Kang, Z., et al. (2020) Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering. Information Sciences, 590, 106-141. [Google Scholar] [CrossRef
[3] Jiang, B., Ding, C. and Luo, B. (2018) Robust Data Representa-tion Using Locally Linear Embedding Guided PCA. Neurocomputing, 275, 523-532. [Google Scholar] [CrossRef
[4] Deutsch, H.P. (2004) Principle Component Analysis. Palgrave Macmillan, London. [Google Scholar] [CrossRef
[5] Gray, R.M. (1990) Vector Quantization. In: Readings in Speech Recognition, Morgan Kaufmann Publishers, Burlington, 75-100. [Google Scholar] [CrossRef
[6] Strang, G. (2003) Introduction to Linear Algebra. Wellesley-Cambridge Press, Wellesley.
[7] Abramson, N., Braverman, D.J. and Sebestyen, G.S. (2006) Pattern Recognition and Machine Learning. Publications of the American Statistical Association, 103, 886-887. [Google Scholar] [CrossRef
[8] Lee, D. (2000) Algorithms for Non-Negative Matrix Factorization. Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, January 2000, 535-541.
[9] Xu, W. and Gong, Y. (2004) Document Clustering by Concept Factorization. Proceedings 27th ACM/SIGIR, Sheffield, 25-29 July 2004, 202-209. [Google Scholar] [CrossRef
[10] Trigeorgis, G., Bousmalis, K., Zafeiriou, S. and Schuller, B.W. (2017) A Deep Matrix Factorization Method for Learning Attribute Rep-resentations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 417-429. [Google Scholar] [CrossRef
[11] Su, X., Hu, L., You, Z., et al. (2021) A Deep Learning Method for Repurposing Antiviral Drugs against New Viruses via Multi-View Nonnegative Matrix Factorization and Its Appli-cation to SARS-CoV-2. Briefings in Bioinformatics, 23, 1-15. [Google Scholar] [CrossRef] [PubMed]
[12] Cai, D., He, X.F., Han, J.W., et al. (2011) Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans on Pattern Analysis and Machine Intelligence, 33, 1548-1560. [Google Scholar] [CrossRef
[13] Cai, D., He, X.F., Han, J.W., et al. (2011) Locally Consistent Con-cept Factorization for Document Clustering. IEEE Trans on Knowledge and Data Engineering, 23, 902-913. [Google Scholar] [CrossRef
[14] Liu, H., Yang, G., Wu, Z., et al. (2014) Locality-Constrained Concept Factorization for Image Representation. IEEE Transactions on Cybernetics, 44, 1214-1224. [Google Scholar] [CrossRef
[15] Tang, J. and Wan, Z. (2021) Orthogonal Dual Graph-Regularized Nonnegative Matrix Factorization for Co-Clustering. Journal of Scientific Computing, 87, Article No. 66. [Google Scholar] [CrossRef
[16] Ke, Q. and Kanade, T. (2005) Robust L1 Norm Factorization in the Presence of Outliers and Missing Data by Alternative Convex Programming. IEEE Computer Society Conference on Computer Vision & Pattern Recognition, Vol. 1, 739-746.
[17] Shen, B., Liu, B.D., Wang, Q.F., et al. (2014) Robust Nonnegative Matrix Factorization via L1 Norm Regularization by Multiplicative Updating Rules. In: Proceedings of the International Conference on Image Processing, IEEE Computer Society Press, Los Alamitos, 5282-5286. [Google Scholar] [CrossRef
[18] Leng, C., Zhang, H., Cai, G., et al. (2019) Graph Regularized Lp Smooth Non-Negative Matrix Factorization for Data Representation. IEEE/CAA Journal of Automatica Sinica, 6, 584-595. [Google Scholar] [CrossRef
[19] Seung, H. and Lee, D. (2000) The Manifold Ways of Perception. Science, 290, 2268-2269. [Google Scholar] [CrossRef] [PubMed]
[20] Gu, Q. and Zhou, J. (2009) Co-Clustering on Manifolds. Pro-ceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Paris, 28 June-1 July 2009, 359-368. [Google Scholar] [CrossRef