基于因子模型对G蛋白偶联受体序列进行聚类
Clustering of G Protein-Coupled Receptor Sequences Based on Factor Model
DOI: 10.12677/HJCB.2017.73004, PDF, HTML, XML, 下载: 1,506  浏览: 3,991  科研立项经费支持
作者: 王华*, 白凤兰, 刘立伟:大连交通大学理学院,辽宁 大连
关键词: 蛋白质序列特征向量因子模型聚类分析Protein Sequence Eigenvalue Factor Model Cluster Analysis
摘要: G蛋白偶联受体(G Protein-Coupled Receptors, GPCRs)是一肽类膜蛋白家族,对GPCRs序列进行聚类分析有着重要的理论意义和应用价值。本文根据氨基酸的分类及其物化性质给出了蛋白质序列的特征向量表示,在此基础上用因子分析法对蛋白质序列的特征向量进行降维得到了因子模型,进而利用因子模型分析了40个GPCRs序列的相似性,并进行聚类分析,得到了较好的结果,为分析比较GPCRs序列提供新的手段。
Abstract: G protein-coupled receptors (GPCRs) is a family of peptide proteins, and it is of great theoretical and practical value to clustering analysis of GPCRs. In this paper, the eigenvector representations of protein sequences are given by the classification and physicochemical properties of amino acids. On the basis of this, dimensions of characteristic vectors of the protein sequences are reduced by factor analysis and obtain factor model. The factor model is used to analyze the similarity of 40 G protein-coupled receptor sequences, simultaneously carrying out the clustering analysis. Better results provide a new approach for analyzing and comparing GPCRs.
文章引用:王华, 白凤兰, 刘立伟. 基于因子模型对G蛋白偶联受体序列进行聚类[J]. 计算生物学, 2017, 7(3): 31-38. https://doi.org/10.12677/HJCB.2017.73004

参考文献

[1] Bockaert, J. and Pin, J.P. (1999) Molecular Tinkering of G Protein-Coupled Receptors: An Evolutionary Success. The EMBO Journal, 18, 1723-1729.
https://doi.org/10.1093/emboj/18.7.1723
[2] Wu, J.S., Ma, X., Zhou, T., et al. (2010) Prediction of G-Protein Coupled Receptors and Their Type. Acta Biochimica et Biophysica Sinica, 26, 138-148.
[3] Liu, N. and Wang, T.M. (2006) Pro-tein-Based Phylogenetic Analysis by Using Hydropathy Profile of Amino Acids. FEBS Letters, 580, 5321-5327.
[4] Liu, N. and Wang, T.M. (2006) A Method for Rapid Similarity Analysis of RNA Secondary Structures. BMC Bioinformatics, 7, 493-503.
https://doi.org/10.1186/1471-2105-7-493
[5] Lisewski, A.M. and Lichtarge, O. (2006) Rapid Detection of Similarity in Protein Structure and Function through Contact Metric Distances. Nucleic Acids Research, 34, 1-10.
https://doi.org/10.1093/nar/gkl788
[6] Notredame, C., Higgins, D.G. and Heringa, J. (2000) T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment. Journal of Molecular Biology, 302, 205-217.
https://doi.org/10.1006/jmbi.2000.4042
[7] Stuart, G.W., Moffett, K. and Baker, S. (2002) Integrated Gene and Species Phylogenies from Unaligned Whole Genome Protein Sequences. Bioinformatics, 18, 100-108.
https://doi.org/10.1093/bioinformatics/18.1.100
[8] Solovyev, V.V. (1993) Fractal Graphical Representation and Analysis of DNA and Protein Sequences. Biosystems, 30, 137-160.
https://doi.org/10.1016/0303-2647(93)90067-M
[9] Das, J., Basu, S., Pan, A. and Dutta, C. (1997) Chaos Game Representation of Proteins. Journal of Molecular Graphics and Modelling, 15, 279-289.
https://doi.org/10.1016/S1093-3263(97)00106-X
[10] Randić, M. (2004) 2-D Graphical Representation of Proteins Based on Virtual Genetic Code. SAR and QSAR in Environmental Research, 15, 147-157.
https://doi.org/10.1080/10629360410001697744
[11] Balaban, A.T., Randic, M. and Zupan, J. (2004) Unique Graphical Representa-tion of Protein Sequences Based on Nucleotide Triplet Codons. Chemical Physics Letters, 397, 247-252.
[12] Krilov, J. and Randic, M. (1997) Characterization of 3-D Sequences of Proteins. Chemical Physics Letters, 272, 115-119.
[13] Liu, N. and Wang, T. (2007) Comparison of Biological Sequences/Structures and Construction of Phylogenetic Trees. Da Lian University, Da Lian.
[14] 范金城, 梅长林. 数据分析(第二版)[M]. 北京: 科技出版社, 2010: 137-150.
[15] 李欣颖, 白凤兰. 蛋白质序列的混合特征值对折叠速率的影响[J]. 生物信息学, 2014, 12(3): 225-231.
[16] 李巍巍, 李阳, 唐旭情. 不同特征描述下H1N1病毒血凝素蛋白质序列的比较分析[J]. 生命科学研究, 2016, 20(2): 119-124.
[17] Bai, F., Gao, H., Liu, L. and Liu, X. (2010) The Similarity Comparison of G-Protein Coupled Receptor Based on Structural Matrix Algorithm. 2010 International Conference on Computational and Infor-mation Sciences, 653-656, Chengdu, Vol. 12, 17-19.