Center for Language Engineering






[ Localization ] [ Language Processing ] [ Linguistic Resources ]

  Urdu Parts of Speech Tagset  
  Release Notes  

Parts of Speech (POS) tagging is a fundamental component of most natural language processing systems. The development of Tagset of a language is the first step towards the achievement of this task. The current Urdu Tagset consists of syntactic categories, and improves upon the earlier versions available (Muaz et al. (2009), Sajjad (2007), Sajjad and Schmid (2009)). This POS Tagset has been used to develop the CLE Urdu Digest POS Tagged Corpus 100K (available at: developed by Center for Language Engineering, KICS, UET, Lahore.

This work has been developed through the project grant for Essential Urdu Linguistic Resources ( in collaboration with University of Konstanz (, Germany and funded by German Academic Exchange Service, DAAD (, Germany.

  Download (This file has been accessed: times, since 31 May 2013)  

Urdu Parts of Speech Tagset


Urdu Parts of Speech Tagset (Old)