在非线性函数下的DNA概率测量聚类分布
DNA Clustering Distribution Measured with Probability under Nonlinear Function
DOI: 10.12677/SEA.2014.33006, PDF, HTML, 下载: 3,081  浏览: 7,952  国家自然科学基金支持
作者: 杜磊:云南大学软件学院信息安全系,昆明;郑智捷:云南省软件工程重点实验室,昆明
关键词: DNA聚类概率值非线性函数基因组序列Cluster of DNA Probability Value Nonlinear Function Genome Sequence
摘要: 典型的聚类分析方法可将相似度较高的数据片段依据测量的数值特征聚集在一起,利用空间分布展示序列中存在相同或者不同的片段。本文针对不同来源的DAN序列,利用分组概率值的统计特征进行计算,采用三种非线性函数获得测量的投影测度,得到对应的基因测量特征形成可视化的聚类分布。比较结果显示,同类基因处理结果分层趋势相同,基因子序列分布图示在更高层次呈现出互补结构,而不同种类基因序列之间存在明显的分布差异。
Abstract: Typical clustering analysis can make similarity data together and show the use of the same or different spatial distribution of fragments presented in the sequence. This paper deals with DAN sequences from different sources using statistical calculations and projection characteristics grouped in three different nonlinear functions of the probability value measurements, getting a visual on the genetic characteristics of the formation of the cluster distribution. Comparison showed that similar stratification had the same trend and complementary characteristics at a higher level, but there are obvious differences between the distributions of different types of gene sequences.
文章引用:杜磊, 郑智捷. 在非线性函数下的DNA概率测量聚类分布[J]. 软件工程与应用, 2014, 3(3): 41-49. http://dx.doi.org/10.12677/SEA.2014.33006

参考文献

[1] Lieberman-Aiden, E., et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326, 289-293.
[2] Zheng, J., Zhang, W.Q., Luo, J., et al. (2013) Variant map system to simulate complex properties of DNA interactions using binary sequences. Advances in Pure Mathematics, 3, 5-24.
[3] Eisen, M., Spellman, P., Brown, P., et al. (1998) Parallel human genome analysis: Cluster analysis and display of genome-wide expression patterns. PNAS, 95, 14863-14868.
[4] Tavazoie, S., Hughes, J.D., Campbell, M.J., et al. (1999) System-attic determination of genetic network architecture. Nature Genetics, 22, 281-85.
[5] Yeung, K.Y., Raley, C., Murua, A., et al. (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics, 17, 977-987.
[6] Beyer, O., Hackel, H., Pieper, V. and Tiedge, J. (1980) 概率计算和数学统计. Harri Deutsch出版社.
[7] Chance, B.L. and Rossman, A.J. (2005) Preface. In: Investigating Statistical Concepts, Applications, and Methods, Duxbury Press, New York.
[8] 吴赣昌 (2008) 概率论与数理统计. 中国人民大学出版社, 北京.
[9] Schneier, B. (1995) Chapter 17—Other Stream Ciphers and Real Random-Sequence Generators. In: Applied Cryptography: Protocols, Algorithms, and Source Code in C, 2nd Edition, Wiley, New York.
[10] 张巍琼, 郑智捷 (2012) 基于不同产生机制的伪随机序列和DNA序列的随机性测量. 成都信息工程学院学报, 6,文章编号: 1671.
[11] http://asia.ensembl.org/Mus_musculus/Info/Index
[12] ftp://ftp.ncbi.nih.gov/genomes/
[13] Chapman, S.J. (2008) MATLAB Programming for Engineers. 2nd Edition, 清华大学出版社, 北京.
[14] Bu, Q.X. and Zheng, J.Z.J. (2013) 2D Conjugate Maps of DNA Sequences. Journal of Information Security, 4, 193196.