IWSLT proposes challenging research tasks and an open experimental infrastructure for the scientific community working on spoken and written language translation. The IWSLT 2011 Evaluation Campaign will focus on the translation of TED Talks, a collection of public speeches on a variety of topics. Four tracks are proposed addressing different research tasks:
ASR track : automatic transcription of talks from audio to text (in
English).
SLT track: speech translation of talks from audio (or ASR output) to text (from
English to
French)
MT track : text translation of talks (from
English to
French,
Arabic to
English, and
Chinese to
English)
SC track : system combination of ASR outputs (in
English) and MT outputs (in
English and
French)
Main challenges of the proposed tracks are:
Open domain ASR, clean transcription of spontaneous speech, detection and removal of non-words, and talk style and topic adaptation.
Open domain SLT, translation of speech or ASR output into true-case punctuated text, and talk style and topic adaptation.
Open domain MT between distant languages, and talk style and topic adaptation.
Training of MT systems and language models for ASR is constrained to data supplied by the organizers. As for ASR acoustic modeling no training data are distributed, participants are allowed to use any publicly available data recorded before 31 December 2010.
Input format: pre-segmented SPHERE files or supplied ASR outputs (transcript, SLF lattice)
Output format: NIST
XML format, true case with punctuation (
example )
Coding: UTF-8
-
Evaluation: BLEU and subjective ranking
-
Development data:
Test data 2:
ASR-SC: input-output format compliant ASR track output format
MT-SC: input-output format compliant with MT tracks output format
Evaluation: WER for ASR-SC, BLEU and subjective ranking for MT-SC
Training data: supplied monolingual and parallel texts
-
Test data:
16 Jun 2011 - Release of training data
24 Jun 2011 - Release of dev data (all but SC track)
22 Aug 2011 - Submission of ASR/MT runs on dev data
25 Aug 2011 - Release of SC dev data
01 Sep 2011 - Release of ASR test data 1
14 Sep 2011 - Release of ASR test data 2
18 Sep 2011 - Submission of ASR runs (closing time 23:59:59 PDT) POSTPONED
19 Sep 2011 - Release of ASR-SC test data
22 Sep 2011 - Submission of ASR-SC runs (closing time 23:59:59 PDT)
20 Sep 2011 - Release of SLT and MT test data
25 Sep 2011 - Submission of SLT and MT runs (closing time 23:59:59 PDT)
27 Sep 2011 - Release of MT-SC test data
30 Sep 2011 - Submission of MT-SC runs (closing time 23:59:59 PDT)
15 Oct 2011 - Submission of system papers
31 Oct 2011 - Notification of acceptance
11 Nov 2011 - Camera ready version of system paper
8-9 Dec 2011 - IWSLT 2011 Workshop
Marcello Federico (FBK, Evaluation Chair)
Luisa Bentivogli (FBK/CELCT, evaluation & analysis)
Mauro Cettolo (FBK, MT/SC-MT data processing)
Sebastian Stueker (KIT, ASR/SLT/SC-ASR evaluation and analysis)
Teresa Hermann (KIT, ASR/SLT/SC-ASR data processing)
Giovanni Moretti (CELCT, crowdsourcing)
Jan Niehues (KIT, ASR/SLT/SC-ASR data processing)
Michael Paul (NICT, evaluation & analysis)