编码和非编码DNA序列的可视化分析
The Visual Analysis of Coding and Non-Coding DNA Sequences
摘要:
DNA序列作为一种复杂的遗传信息,其具体特性不仅体现在编码序列之中,也包含在非编码序列之中。在高等生物体中主要基因成分为非编码序列,在ENCODE计划中,有证据表明,在人类基因中有98%为非编码形式,其中80%具有功能性,所以对编码区和非编码区的研究已经成为一类重要研究热点。本文提供的模型和实验结果,使用图形表示方法对编码区以及非编码区基因的差异进行区分。该模型采用的是对编码区以及非编码区的DNA序列进行分段概率测量,从而对不同的基因特征分布进行比较。
Abstract:
DNA sequences include complex genetic information; their specific characteristics are contained in both the coding and non-coding sequences. Major gene components in higher levels of organisms are composed of non-coding sequences. In ENCODE project, there are evidences that 98% of the human genomes are non-coding forms and 80% of them with functions, so the research on coding region and non-coding region has become an important research hotspot. This paper provides models and experiment results which using visual representation techniques to distinguish differences between coding and non-coding sequences. This model uses probability measurements on the DNA sequences to coding and non-coding regions respectively to distinguish patterns identified from different sequences.
参考文献
|
[1]
|
Kumar, S.S. (2005) Responsibilities in the post genome era: Are we prepared? Issues in Medical Ethics, 10, 150-151.
|
|
[2]
|
Bernstein, B.E., Birney, E., Dunham, I., et al. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57-74.
|
|
[3]
|
Pennisi, E. (2012) Genomics. ENCODE project writes eulogy for junk DNA. Science, 337, 1159-1161.
|
|
[4]
|
Ecker, J.R., Bickmore, W.A. and Barroso, I. (2012) Genomics: ENCODE explained. Nature, 489, 52-55.
http://www.nature.com/nature/journal/v489/n7414/full/489052a.html
|
|
[5]
|
Randić, M., Novič, M. and Plavšić, D. (2013) Milestones in graphical bioinformatics. International Journal of Quantum Chemistry, 113, 2413-2446.
|
|
[6]
|
Staden, R. and Mclachlan, A.D. (1981) Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucleic Acids Research, 10, 141-156.
|
|
[7]
|
Michel, C.J. (1986) New statistics approach to discriminate between protein coding and non-coding. Journal of Theoretical Biology, 120, 223-236.
|
|
[8]
|
张春霆 (1999) 用几何学方法分析DNA序列. 中国科学基金, 3.
|
|
[9]
|
Li, Q.P. and Zheng, Z.J. (2010) Spatial distributions for measures of random sequences using 2D conjugate maps. Proceedings of Asia-Pacific Youth Conference on Communication (APYCC) (ISTP), Kunming, 64-69.
|
|
[10]
|
张巍琼, 郑智捷 (2012) 基于不同产生机制的伪随机序列和DNA序列的随机性测量. 成都信息工程学院学报, 6, 548-555.
|