基于前缀位置及允许错配的DNA序列进化分析
DNA Sequence Evolution Analysis Based on Prefix Position and Allowable Mismatch
DOI: 10.12677/AAM.2021.106204, PDF,   
作者: 高胜男:辽宁师范大学,辽宁 大连
关键词: 非比对方法前缀集错配进化树Alignment-Free Method Prefix Set Mismatch Phylogenetic Tree
摘要: 对DNA序列相似性的比对分析是生物信息学中的重要问题。由于多序列比对(MSA)方法耗时较长,因此非比对方法的应用变得流行起来。DNA序列中的基因突变对序列比对分析的影响是不可忽视的,突变的存在使本应该被匹配的位置丢失。本文在构建环形前缀树以及获得前缀集的基础上,考虑完全匹配与错配两个匹配法则,提取序列中最佳的匹配位置,创建了新的关于位置差的非比对方法,对多序列进行成对比对,并运用邻接(Neighbor-Joining)法构建进化树,从而得到有效的进化关系。
Abstract: The comparison and analysis of DNA sequence similarity is an important issue in bioinformatics. Because the multiple sequence alignment (MSA) method takes a long time, the application of non- alignment methods has become popular. The influence of genetic mutations in DNA sequences on sequence comparison analysis cannot be ignored. The existence of mutations makes the positions that should have been matched lose. Based on the construction of the ring prefix tree and the acquisition of the prefix set, this paper considers the two matching rules of perfect match and mismatch, extracts the best matching position in the sequence, and creates a new non-alignment method for position difference, which is used for multiple sequences. Perform pairwise comparisons and use the Neighbor-Joining method to construct evolutionary trees to obtain effective evolutionary relationships.
文章引用:高胜男. 基于前缀位置及允许错配的DNA序列进化分析[J]. 应用数学进展, 2021, 10(6): 1937-1944. https://doi.org/10.12677/AAM.2021.106204

参考文献

[1] Tian, K., Zhao, X., Yau, S.S.-T. (2018) Convex Hull Analysis of Evolutionary and Phylogenetic Relationships between Biological Groups. Journal of Theoretical Biology, 456, 34-40. [Google Scholar] [CrossRef] [PubMed]
[2] Wen, J., Chan, R.H.F., Yau, S.-C., He, R.L. and Yau, S.S.T. (2014) K-mer Natural Vector and Its Application to the Phylogenetic Analysis of Genetic Sequences. Gene, 546, 25-34. [Google Scholar] [CrossRef] [PubMed]
[3] Ji, Q., Bin, W. and Bai-IinHao (2004) Whole Proteome Prokaryote Phylogeny without Sequence Alignment: A K-String Composition Approach. Journal of Molecular Evolution, 58, 1-11. [Google Scholar] [CrossRef] [PubMed]
[4] Zhang, Y.Y., Wen, J. and Yau, S.S.-T. (2018) Phylogenetic Analysis of Protein Sequences Based on a Novel K-mer Natural Vector Method. Genomics, 111, 1298-1305. [Google Scholar] [CrossRef] [PubMed]
[5] Weiner, P. (1973) Linear Pattern Matching Algorithms. 14th Annual Symposium on Switching and Automata Theory (Swat 1973), USA, 15-17 October 1973, 1-11. [Google Scholar] [CrossRef
[6] 王代, 陆超. 基于前缀标识符及其位置的DNA序列比较[J]. 自然科学, 2021, 9(2): 281-290. [Google Scholar] [CrossRef
[7] 陆超, 王代. 基于共同前缀位置的哺乳动物mtDNA序列系统发育分析[J]. 自然科学, 2021, 9(2): 272-280. [Google Scholar] [CrossRef
[8] 张欣. 基于后缀树的DNA序列进化树构建研究[D]: [硕士学位论文]. 大连: 辽宁师范大学, 2019.
[9] Amiri, S., and Dinov, I.D. (2016) Comparison of Genomic Data via Statistical Distribution. Journal of Theoretical Biology, 407, 318-327. [Google Scholar] [CrossRef] [PubMed]
[10] Bernard, G., Chan, C.X., Chan, Y.-B., Chua, X.-Y., Cong, Y.N., Hogan, J.M., Maetschke, S.R. and Ragan, M.A. (2017) Alignment-Free Inference of Hierarchical and Reticulate Phylogenomic Relationships. Briefings in Bioinformatics, 20, 426-435. [Google Scholar] [CrossRef] [PubMed]
[11] Yin, C.C. and Yau, S.S.-T. (2015) An Improved Model for Whole Genome Phylogenetic Analysis by Fourier Transform. Journal of Theoretical Biology, 382, 99-110. [Google Scholar] [CrossRef] [PubMed]