What's New
Evaluation Details

Past Editions   
IWSLT 2004
IWSLT 2005
IWSLT 2006


A major theme at this year's edition of IWSLT is the sharing of linguistic resources and tools among the participants to make the evaluation more collaborative and fair. To this end, we ask that each participant send us information about non-proprietary resources used in the development of this year's submission so that other groups may also utilize these resources for the various tasks. The deadline for submission ( see the schedule page ) is intended to give other participants time to use the resources if they would like. Additional resources not included in data sets provided by IWSLT partners will be placed on this page.

It should be noted though, that participants do not have to provide resources directly. Nor are participants required to provide resources that they have acquired elsewhere and then have modified in some way (i.e. cleaned, corrected, enhanced, etc. ). In this latter example, a group would provide a reference and/or link to the original provider or creator.

Acceptable Resources. Some examples of resources that can be used include:
  • Publicly available aligned or monolingual corpora such as EuroParl or LDC data ( see below). It is possible that some of these resources have licensing fees but they should be "reasonable" and affordable by most research groups.
  • Publicly available annotated treebanks.

Some examples of resources that can NOT be used include:
  • Privately developed linguistic resources and/or corpora
  • NIST or LDC data which require participation in an evaluation campaign. Some examples include data available for the GALE or TREC. ( i.e. resources with LDC catalog codes such as "LDCyyyyExx" or "LDCyyyyGxx".
  • Publicly available linguistic resources which require high licensing fees.

If you have any interesting resources that you would like to share with other participants or questions concerning resources, you can send them to Cam Fordyce at info AT celct DOT it and put "[IWSLT07 Resources]" in the subject line.

The following list of resources includes resources provided by the organizers and participants. A few of the links are files. The web links have not verified. If you find broken links, please let us know and we will do our best to resolve the problem. The list below was updated on 26 June, 2007.
[Linguistic Resources] [Software Resources]

Linguistic Resources:


Software Resources: