Center for Language Engineering

 
 



 

 

KICS
KICS-UET


 
 

[ Localization ] [ Language Processing ] [ Linguistic Resources ]

 
   
  Urdu WordNet 1.0 Wordlist  
     
  Release Notes  
 

This wordlist comprises of 5000 content words of which approx 3000 words have been extracted from the following three sources, and additional 2000 words have been added during the development of Urdu WordNet based on the initial 3000 words.

1. 18 million words corpus crawled from online newswebsites covering a wide range of domains including sports, news, finance, culture, etc. (Fordetails see: http://www.cle.org.pk/Publication/papers/2007/corpus_based_urdu_lexicon_development.pdf)

2. CLE12T001 CLE Urdu Digest Corpus 100K covering the domains e.g. education, health, politics, international affairs, sports, business, humor and literature. (For details see: http://www.cle.org.pk/clestore/urdudigestcorpus100k.htm)

3. Urdu Verb List extracted from Urdu Lughat. (Fordetails see: http://www.cle.org.pk/software/ling_resources/urduverblist.htm)

Selection of words in this wordlist is based on thefollowing parameters:

1. Lexemes have been included and their inflectional forms are not included
2. Closed form compound words have been included
3. Multiple correct spellings have been included
4. Foreign words are included in the list if they are listed in the Urdu Lughat (available at http://182.180.102.251:8081/oud/default.aspx) or if they occur at least 20 times in the CLE Urdu Digest one million words corpus (available at http://www.cle.org.pk/clestore/urdudigestcorpus1M.htm)

This wordlist forms the basis of the Urdu WordNet 1.0 developed by Center for Language Engineering, KICS, UET, Lahore. This work has been developed through the project grant for Essential Urdu Linguistic Resources (www.cle.org.pk/eulr) in collaboration with University of Konstanz (http://www.uni-konstanz.de/),Germany and funded by German Academic Exchange Service, DAAD (https://www.daad.org/), Germany.

 
     
  Download (This file has been accessed: times, since 27 May 2013)  
 

Urdu WordNet 1.0 Wordlist

   
     
 

webmaster@cle.org.pk