Center for Language Engineering






[ Localization ] [ Language Processing ] [ Linguistic Resources ]

  Urdu WordNet 1.0 Wordlist  
  Release Notes  

This wordlist comprises of 5000 content words of which approx 3000 words have been extracted from the following three sources, and additional 2000 words have been added during the development of Urdu WordNet based on the initial 3000 words.

1. 18 million words corpus crawled from online newswebsites covering a wide range of domains including sports, news, finance, culture, etc. (Fordetails see:

2. CLE12T001 CLE Urdu Digest Corpus 100K covering the domains e.g. education, health, politics, international affairs, sports, business, humor and literature. (For details see:

3. Urdu Verb List extracted from Urdu Lughat. (Fordetails see:

Selection of words in this wordlist is based on thefollowing parameters:

1. Lexemes have been included and their inflectional forms are not included
2. Closed form compound words have been included
3. Multiple correct spellings have been included
4. Foreign words are included in the list if they are listed in the Urdu Lughat (available at or if they occur at least 20 times in the CLE Urdu Digest one million words corpus (available at

This wordlist forms the basis of the Urdu WordNet 1.0 developed by Center for Language Engineering, KICS, UET, Lahore. This work has been developed through the project grant for Essential Urdu Linguistic Resources ( in collaboration with University of Konstanz (,Germany and funded by German Academic Exchange Service, DAAD (, Germany.

  Download (This file has been accessed: times, since 27 May 2013)  

Urdu WordNet 1.0 Wordlist