Participation in
Benchmarking Evaluations
Over the years, my research team has participated in
various international benchmarking evaluations. Some achievements include:
First in HOO (Helping Our Own) 2012 shared task, on error detection and
correction of determiners and prepositions: the best team among 14 participating
teams, and the best system among 85 systems submitted to the shared task.
Second in HOO (Helping Our Own) 2011 pilot shared task, out of 6 participating teams.
In WMT 2011, our machine translation evaluation metric, TESLA-M, achieved the highest average system level correlation for translating English into 3 European languages (out of 15 metrics), and TESLA-F achieved the highest average system level correlation for translating into English (out of 12 metrics).
In WMT 2010, our machine translation evaluation metric, TESLA-M, achieved the highest average system level correlation on the WMT10 test set (out of 18 metrics) when translating from English into 4 European languages, and TESLA achieved the highest average system level correlation when translating into English (out of 28 metrics).
Second in the BTEC task (translating Chinese to English), out of 12 participating teams at IWSLT 2009 (International Workshop on Spoken Language Translation).
First in coarse-grained English all-words task (out of 14 systems), and second
in fine-grained English all-words task (out of 13 systems) in SemEval 2007,
organized by ACL SIGLEX, 2007.
First on 3 test corpora and second on the 4th test corpus (out of 18 teams) in the
open track of the Second International Chinese Word Segmentation Bakeoff,
organized by ACL SIGHAN, 2005.
Third among 47 systems in the English lexical sample task, Senseval-3, organized
by ACL SIGLEX, 2004.
First among 8 systems in the translation subtask, first among 6 systems in the
translation and sense subtask, of the multilingual lexical sample task,
Senseval-3, organized by ACL SIGLEX, 2004.
Second among 16 systems in the English named entity recognition task at the
CoNLL 2003 shared task.
Top 2 scores (among 6 groups submitting 11 runs) in the routing subtask of the
filtering track, TREC-8, 1999.