酵母基因组核小体定位序列预测

doi:10.12677/BIPHY.2018.61001

期刊菜单

酵母基因组核小体定位序列预测
Prediction of Nucleosome Positioning Sequence for Yeast Genome

DOI: 10.12677/BIPHY.2018.61001, PDF, 科研立项经费支持
作者: 胡世赛, 陈宇翔, 张颖, 吕军^*：内蒙古工业大学理学院，内蒙古呼和浩特
关键词: 核小体定位序列；多样性增量；特征选择技术；Nucleosome Positioning Sequence； Increment of Diversity； Feature Selection Technology

摘要: 核小体是染色质结构的基本单位，其在整条DNA序列上的定位分布情况，对于真核生物的基因表达调控起关键作用。用机器学习方法预测核小体定位成为近年来的研究热点。以DNA序列6-mer组分为参数，采用我们提出的多样性增量特征选择技术，筛选出8个6-mer作为分类特征。进一步，采用支持向量机算法，10折交叉检验的总精度达到98.2%。结果表明，核小体定位序列和连接序列核苷k-mer组分的特异化分布，是影响酵母核小体定位的主要因素。

Abstract: Nucleosome is a basic unit of chromatin structure. Its location and distribution on the entire DNA sequence play a key role in the regulation of gene expression in eukaryotes. The prediction of nucleosome positioning with machine learning method has become a hot topic in recent years. Taken the 6-mer component of DNA sequence as the parameter, we used the increment of diversity feature selection technique proposed by us to select eight 6-mers as the classification characteristics. Furthermore, the total accuracy of the 10 fold cross validation is 98.2% using the support vector machine algorithm. The results show that the specific distribution of the k-mer component in the nucleosomal and linker sequences is the main factor that affected nucleosome positioning in yeast.

文章引用：胡世赛, 陈宇翔, 张颖, 吕军. 酵母基因组核小体定位序列预测[J]. 生物物理学, 2018, 6(1): 1-6. https://doi.org/10.12677/BIPHY.2018.61001

参考文献

[1]	Richmond, T.J. and Davey, C.A. (2003) The Structure of DNA in the Nucleosome Core. Nature, 423, 145-150. [Google Scholar] [CrossRef] [PubMed]
[2]	Mavrich, T.N., Ioshikhes, I.P., Venters, B.J., Jiang, C., Tomsho, L.P., Qi, J., Schuster, S.C., Albert, I. and Pugh, B.F. (2008) A Barrier Nucleosome Model for Statistical Positioning of Nucleosomes throughout the Yeast Genome. Genome Research, 18, 1073-1083. [Google Scholar] [CrossRef] [PubMed]
[3]	Clapier, C.R. and Cairns, B.R. (2009) The Biology of Chromatin Remodeling Complexes. Annual Review of Biochemistry, 78, 273-304. [Google Scholar] [CrossRef] [PubMed]
[4]	Rando, O.J. and Ahmad, K. (2007) Rules and Regulation in the Primary Structure of Chromatin. Current Opinion in Cell Biology, 19, 250-256. [Google Scholar] [CrossRef] [PubMed]
[5]	Kaplan, N., Moore, I.K., Fondufe-Mittendorf, Y., Gossett, A.J., Tillo, D., Field, Y., LeProust, E.M., Hughes, T.R., Lieb, J.D., Widom, J. and Segal, E. (2009) The DNA-Encoded Nucleosome Organization of a Eukaryotic Genome. Nature, 458, 362-366. [Google Scholar] [CrossRef] [PubMed]
[6]	Wu, J., Zhang, Y. and Mu, Z. (2014) Predicting Nucleosome Positioning Based on Geometrically Transformed Tsallis Entropy. PLoS One, 9, e109395. [Google Scholar] [CrossRef] [PubMed]
[7]	Liu, G., Xing, Y., Zhao, H., Wang, J., Shang, Y. and Cai, L. (2016) A Deformation Energy-Based Model for Predicting Nucleosome Dyads and Occupancy. Scientific Reports, 6, 24133. [Google Scholar] [CrossRef] [PubMed]
[8]	Chen, W., Feng, P., Ding, H., Lin, H. and Chou, K.C. (2016) Using Deformation Energy to Analyze Nucleosome Positioning in Genomes. Genomics, 107, 69-75. [Google Scholar] [CrossRef] [PubMed]
[9]	Awazu, A. (2017) Prediction of Nucleosome Positioning by the Incorporation of Frequencies and Distributions of Three Different Nucleotide Segment Lengths into a General Pseudo k-Tuple Nucleotide Composition. Bioinformatics, 33, 42-48. [Google Scholar] [CrossRef] [PubMed]
[10]	Teif, V.B. (2016) Nucleosome Positioning: Resources and Tools Online. Briefings in Bioinformatics, 17, 745-757. [Google Scholar] [CrossRef] [PubMed]
[11]	Lee, W., Tillo, D., Bray, N., Morse, R.H., Davis, R.W., Hughes, T.R. and Nislow, C. (2007) A High-Resolution Atlas of Nucleosome Occupancy in Yeast. Nature Genetics, 39, 1235-1244. [Google Scholar] [CrossRef] [PubMed]
[12]	Fu, L., Niu, B., Zhu, Z., Wu, S. and Li, W. (2012) CD-HIT: Accelerated for Clustering the Next-Generation Sequencing Data. Bioinformatics, 28, 3150-3152. [Google Scholar] [CrossRef] [PubMed]
[13]	Yang, S.Q., Hu, S.S., Zhang, Y. and Lv, J. (2017) Application of Feature Selection Technology Based on Incremental of Diversity in Prediction of Flexible Regions from Protein Sequences. Letters in Organic Chemistry, 14, 621-624. [Google Scholar] [CrossRef]
[14]	Lu, J. and Luo, L.F. (2008) Prediction for Human Transcription Start Site Using Diversity Measure with Quadratic Discriminant. Bioinformation, 2, 316-321. [Google Scholar] [CrossRef] [PubMed]
[15]	Lu, J., Luo, L.F., Zhang, L.R., Chen, W. and Zhang, Y. (2010) Increment of Diversity with Quadratic Discriminant Analysis—An Efficient Tool for Sequence Pattern Recognition in Bioinformatics. Open Access Bioinformatics, 2, 89-96. [Google Scholar] [CrossRef]
[16]	Drotár, P., Gazda, J. and Smékal, Z. (2015) An Experimental Comparison of Feature Selection Methods on Two-Class Biomedical Datasets. Computers in Biology and Medicine, 66, 1-10. [Google Scholar] [CrossRef] [PubMed]
[17]	Yu, L. and Liu, H. (2003) Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In: Fawcett, T. and Mishra, N., Eds., Proceedings of the Twentieth International Conference on International Conference on Machine Learning, The AAAI Press, California, 856-863.
[18]	Peng, H., Long, F. and Ding, C. (2005) Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1226-1238. [Google Scholar] [CrossRef]
[19]	Chang, C.C. and Lin, C.J. (2011) LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology, 2, 1-27. [Google Scholar] [CrossRef]
[20]	Suter, B., Schnappauf, G. and Thoma, F. (2000) Poly(dA.dT) Sequences Exist as Rigid DNA Structures in Nucleosome-Free Yeast Promoters in Vivo. Nucleic Acids Research, 28, 4083-4089. [Google Scholar] [CrossRef] [PubMed]
[21]	Chen, W., Lin, H., Feng, P.M., Ding, C., Zuo, Y.C. and Chou, K.C. (2012) iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties. PLoS ONE, 7, e47843. [Google Scholar] [CrossRef] [PubMed]

为你推荐

友情链接