高通量测序及读序映射算法的应用
High Throughput Sequencing Methods and Applications of Read Mapping Algorithm
摘要: 免疫共沉淀-DNA高通量测序二者的结合是研究蛋白质与基因组DNA相互作用及组蛋白修饰的新实验工具,它同时也对短DNA读序在基因组上的映射、映射结果比较提出了新的算法需求。本文介绍新一代测序原理及数据的特点、相关的读序映射算法的基本原理及对应软件,并说明了该方法在组蛋白修饰、转录因子结合位点分析中的应用。
Abstract: Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has become a tool for studying DNA-binding proteins profile and histone modifications. At the same time, it also arouse requirement for ef-fective computational method to map short DNA reads and to compare mapping profiles, which are crucial for uncovering biological mechanisms. In this article, we introduced the principle of new generation se-quencing and corresponding data format, then give an overview of current method of reads mapping and peak identification. Finally, we demonstrated the application of this method in histone modifications analysis and transcription factor binding sites identification.
文章引用:李慧丽, 何风, 杨航, 郑焱, 吴晓明. 高通量测序及读序映射算法的应用[J]. 生物医学, 2011, 1(1): 1-5. http://dx.doi.org/10.12677/hjbm.2011.11001

参考文献

[1] M. L. Li, W. Wang, and Z. H. Lu. Genomic analysis of DNA-protein interaction by chromatin immunoprecipitation. Hereditas, 2010, 32(3): 219-228.
[2] C. Chen, H. Wan, and Q. Zhou. The next generation sequencing technology and its application in cancer research. Chinese Jour-nal of Lung Cancer, 2010, 13(2): 154-159.
[3] Browser UG. UCSC Genome Browser: Wiggle Track Format (WIG)[URL]. http://genome.ucsc.edu/goldenPath/help/wiggle.html, 2011-7-16 /2011-7-16.
[4] Welcome Trust Sanger Institute, Genome Research Limited. GFF (General Feature Format) Specifications Document—Welcome Trust Sanger Institute [URL]. http://www.sanger.ac.uk/resources/software/gff/spec.html, 2011 -4-19/2011-7-16.
[5] H. Jiang, W. H. Wong. SeqMap: Mapping massive amount of oligonucleotides to the genome. Bioinformatics, 2008, 24(20): 2395-2396.
[6] H. Li, J. Ruan, and R. Durbin. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 2008, 18(11): 1851-1858.
[7] B. Langmead, C. Trapnell, M. Pop, et al. Ultrafast and mem-ory-efficient alignment of short DNA sequences to the human genome. Genome Biol., 2009, 10(3): R25.
[8] S. M. Rumble, P. Lacroute, A. V. Dalca, et al. SHRiMP: Accu-rate mapping of short color-space reads. PLoS Comput Biol, 2009, 5(5): Article ID e1000386.
[9] B. D. Ondov, A. Varadarajan, K. D. Passalacqua, et al. Efficient mapping of Applied Biosystems SOLiD sequence data to a ref-erence genome for functional genomic applications. Bioinfor-matics, 2008, 24(23): 2776-2777.
[10] H. Ji, H. Jiang, W. Ma, et al. An integrated software system for analyzing ChIP-Chip and ChIP-Seq data. Nat Biotechnol, 2008, 26(11): 1293-1300.
[11] D. S. Johnson, A. Mortazavi, R. M. Myers, et al. Genome-wide mapping of in vivo protein-DNA interactions. Science, 2007, 316(5830): 1497-1502.
[12] Y. Zhang, T. Liu, C. A. Meyer, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biology, 2008, 9(9): R137.
[13] Z. S. Qin, J. Yu, J. Shen, et al. HPeak: An HMM-based algo-rithm for defining read-enriched regions in ChIP-Seq data. BMC Bioinformatics, 2010, 11: 369.
[14] A. P. Fejes, G. Robertson, M. Bilenky, et al. FindPeaks 3.1: A tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics, 2008, 24(15): 1729-1730.
[15] R. Jothi, S. Cuddapah, A. Barski, et al. Genome-wide identifica-tion of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res., 2008, 36(16): 5221-5231.
[16] A. Barski, S. Cuddapah, K. Cui, et al. High-resolution profiling of histone methylations in the human genome. Cell, 2007, 129(4): 823-837.
[17] T. S. Mikkelsen, M. Ku, D. B. Jaffe, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Na-ture, 2007, 448(7153): 553-560.
[18] G. Robertson, M. Hirst, M. Bainbridge, et al. Genome-wide profiles of STAT1 DNA association using chromatin immuno-precipitation and massively parallel sequencing. Nature Methods, 2007, 4(8): 651-657.
[19] J. Eid, A. Fehr, J Gray, et al. Real-time DNA sequencing from single polymerase molecules. Science, 2009, 323(5910): 133-138.