Menu            

Home
What's New
Theme(cfp)
Schedule
Evaluation
Tasks
Registration
Data
Download
Upload
Evaluation Details
Resources
Workshop
Program
Organization

Past Editions   
IWSLT 2004
IWSLT 2005
IWSLT 2006

DATA OVERVIEW



This page presents information concerning the data to be used in the four tracks. As mentioned elsewhere,in contrast with other MT evaluation campaigns, input for translation is not written text but transcriptions of speech. The transcripts will come from both manually created transcripts and from the output of automatic speech recognition systems. This year, ASR engine output will be provided as n-best lists and word-graphs.

Highlights:

  • Data Conditions:
    This year there will be one data condition for all tasks, the open data condition. All publicly available data will be allowed.
  • Linguistic Resources:
    Some links to additional linguistic resources will be provided to the participants. In addition, a month before the submission date, participants will be asked to submit links to the linguistic resources that each plans to use in the system to be submitted. This is to encourage the sharing of useful LRs such as dictionaries, name lists, etc. Any further processing of the LR after discovered remains with the participant if so desired.
  • Encodings:
    All data provided by IWSLT will be in UTF-8 where appropriate. Specific Encodings for Japanese, Chinese, and Arabic as detailed below.
  • ASR Outputs:
    All ASR outputs will come in two formats, n-best lists and word lattices ( in HTK Standard Lattice Format (SLF) ).

Data Description









TOP

Challenge Task - Chinese-English



TOP

Challenge Task - Italian-English



TOP

Challenge Task - Japanese-English



TOP

Challenge Task - Arabic-English