Center for Language Engineering






[ Localization ] [ Language Processing ] [ Linguistic Resources ]

  Testing Corpus for Machine Translation System  

Center for Language Engineering (CLE) is pleased to release testing corpus for English to Urdu machine translation systems. It is highly recommended that this corpus should not be used to train machine translation systems to ensure unbiased evaluation afterwards. The corpus contains 400 English sentences collected from different news papers, including Pakistani, English and American dailies, e.g. Nation, News, Pakistan Times, Dawn, BBC, CNN, NYT, Washington Post, Times, NewsWeek, National Geographic, Economist, etc. The collected sentences are then translated in Urdu by three translators independently.

The work has been supported by International Development Research Center (IDRC) of Canada, through PAN Localization project (

  Download (This file has been accessed: times, since 23 December 2010)  
  Testing Corpus Data License