学术期刊
切换导航
首 页
文 章
期 刊
投 稿
预 印
会 议
书 籍
新 闻
合 作
我 们
按学科分类
Journals by Subject
按期刊分类
Journals by Title
核心OA期刊
Core OA Journal
数学与物理
Math & Physics
化学与材料
Chemistry & Materials
生命科学
Life Sciences
医药卫生
Medicine & Health
信息通讯
Information & Communication
工程技术
Engineering & Technology
地球与环境
Earth & Environment
经济与管理
Economics & Management
人文社科
Humanities & Social Sciences
合作期刊
Cooperation Journals
首页
生命科学
计算生物学
Vol. 1 No. 2 (December 2011)
期刊菜单
最新文章
历史文章
检索
领域
编委
投稿须知
文章处理费
最新文章
历史文章
检索
领域
编委
投稿须知
文章处理费
基于支持向量机的蛋白质命名实体识别的研究
Research of Protein Named Entity Recognition Based on SVMs
DOI:
10.12677/hjcb.2011.12002
,
PDF
,
HTML
,
,
被引量
国家自然科学基金支持
作者:
龚乐君
:东南大学生物科学与医学工程学院,南京;淮阴工学院计算机工程学院,淮安;
付亚星
,
孙啸
,
谢建明
,
于双鑫
:东南大学生物科学与医学工程学院,南京
关键词:
支持向量机
;
蛋白质实体识别
;
特征选择
Supports Vector Machines (SVMs); Protein Entity Recognition; Feature Selection
摘要:
发展一种利用支持向量机识别蛋白质命名实体的方法,选择四组特征对蛋白质语料进行识别实验。实验表明,与基线系统相比,上下文特征有较小的增幅,而当前词的词性及词形的组合特征获得了最好的性能,达到78.43%的准确率。这一研究结果显示词性及词形特征在蛋白质实体识别中起着重要的作用。
Abstract:
This paper describes an approach to identify protein named entity using Supports Vector Machines (SVMs), and selects four groups of features to do experiments for the protein corpus. Experiment results show the system performance of context features increases smaller than baseline system, and the combined feature of part of speech (POS) and word type is achieved 78.43% accuracy which is the best performance in all ex- periments. The research results show the combined feature of POS and word type play important roles in the protein entity recognition.
文章引用:
龚乐君, 付亚星, 孙啸, 谢建明, 于双鑫. 基于支持向量机的蛋白质命名实体识别的研究[J]. 计算生物学, 2011, 1(2): 5-10.
http://dx.doi.org/10.12677/hjcb.2011.12002
参考文献
[
1
]
P. Zweigenbaum, D. Demner-Fushman, H. Yu, et al. Frontiers of biomedical text mining: Current progress. Brief Bioinform, 2007, 8(5): 358-375.
[
2
]
U. Leser, J. Hakenberg. What makes a gene name? Named entity recognition in the biomedical literature. Brief Bioinform, 2005, 6(4): 357-369.
[
3
]
J. Kazama, T. Makino, Y. Ohta, et al. Tuning support vector machines for biomedical named entity recognition. In: Procee- dings of the Workshop on Natural Language Processing in the Bio-Medical Domain at ACL, 2002: 1-8.
[
4
]
K. Toutanova, C. D. Manning. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), 2000: 63-70.
[
5
]
T. Kudo, Y. Matsumoto. Use of support vector learning for chunk identification. Proceeding ConLL’00 Proceedings of the 2nd Work- shop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, 7: 142-144.
[
6
]
T. Nakagawa, T. Kudoh and Y. Matsumoto. Unknown word gue- ssing and part-of-speech tagging using support vector machines. In Proceeding of the 6th NLPRS, 2001: 325-331.
[
7
]
L. A. Ramshaw, M. P. Marcus. Text chunking using transfor- mation-based learning. In Proceedings of the ACL Third Work- shop on Very Large Corpora, 1995: 82-94.
[
8
]
C. W. Hsu, C. J. Lin. A comparison of methods for multiclass support vector machines. IEEE Transaction on Neural Networks, 2002, 13(2): 415-425.
[
9
]
C. C. Chang, C. J. Lin. Training nu-support vector regression: theory and algorithms. Neural Computer, 2002, 14(8): 1959- 1577.
投稿
为你推荐
友情链接
科研出版社
开放图书馆