摘要: 基因测序技术在过去的二十几年里取得了突飞猛进的发展,随着以高通量,短读取,低成本为特点的新一代基因测序技术的问世,测序一个物种全基因的时间和成本大大降低。基于下一代测序技术的全基因组装算法和软件相继开发出来,目前比较成熟的基因组装算法大约有二十种左右。由于基因组装问题本身的复杂性,目前还没有针对不同组装算法的具体设计步骤,操作环境,应用范围等方面的调研。基于此本文简要调研了现有的十二种具有代表性的基因组装算法,系统的分析了每种算法的设计步骤,算法原理,操作环境以及应用。这篇调研对于如何设计基因组装算法,对于不同的基因数据如何选择更加合适的基因组装算法和软件提供了一定的指导。
Abstract:
During the last twenty years, genome sequencing technology has made a great development. With the appearance of the Next Generation Sequencing Technology characterized by higher throughput, shorter read and even lower cost, the time and cost of sequencing the whole genomes of a species are sharply decreased. The genome assembly algorithms and software are developed in succession, currently there are almost twenty ones. Due to the complexity of genome assembly itself, there is no survey aiming at the special steps, operational environment, applications of the different genome algorithms. In this perspective, we survey these typical twelve genome assembly algorithms briefly and then analyze the special steps, principles, operational environments and applications. This survey can provide a guidance on how to design or develop a genome assembly algorithm or software, and which genome assembly algorithm of software is more suitable according to different genome sequencing data.