Tutorial on Evaluation of
Bio-Textmining systems
(October 24, 2 p.m. - 5 p.m.)
Jin-Dong
Kim (Database Center for Life Science, Tokyo)
Martin
Krallinger (Spanish National Cancer Research Center - CNIO, Madrid)
“Evaluation” is a fundamental skill of science, establishing
the direction and goals of research and development, and providing
means to estimate the “state of the art”. In bio-textmining
field, there have been several efforts to facilitate community-wide
evaluation of textmining systems at various levels, e.g. Trec Genomic
Track, LLL, BioCreative, and BioNLP Shared Task. Recent trends show
that the information needs of bio-textmining are becoming more
complex, e.g. PPIs, bio-events, SDA, which in turn leads needs for
more elaborately designed evaluation methods. The community-wide
evaluation efforts have been intended to address such needs.
However, without proper understanding of the setting and purpose of
evaluation methods, the evaluation results are often misleading. For
example, BioCreative II.5 reported the best performance of PPI task
at around 30% in F-score, whereas BioNLP’09 Shared Task reported
the best performance of bio-event extraction task at around 60%.
Lack of an understanding of the setting of the two tasks may cause a
difficulty in interpreting the scores, raising a doubt such as how
the seemingly simpler PPI task achieved a lower best performance
compared to the event extraction task. It should be even more
problematic if one develops a textmining system simply to get a
better evaluation score, without properly understanding the meaning.
In this half day tutorial, various evaluation settings and methods
of bio-textmining will be introduced and explained in detail with a
special emphasis on their practical meanings. Observations and
analysis on the results of previous evaluation events will be
provided to deepen the insight into real problems. In the end of the
tutorial, the settings of upcoming evaluation events, BioCreative and
BioNLP will be explained. Upon completion of the tutorial, the
participants are expected to better understand the principles of
various evaluation methods, metrics, settings, etc., and to be better
prepared for developing bio-textmining systems in more useful ways.