  Urdu Parts of Speech Tagset  
Parts of Speech (POS) tagging is a fundamental component of most natural language processing systems. The development of Tagset of a language is the first step towards the achievement of this task. The current Urdu Tagset consists of syntactic categories, and improves upon the earlier versions available (Muaz et al. (2009), Sajjad (2007), Sajjad and Schmid (2009)). This POS Tagset has been used to develop the CLE Urdu Digest POS Tagged Corpus 100K (available at: developed by Center for Language Engineering, KICS, UET, Lahore.

This work has been developed through the project grant for Essential Urdu Linguistic Resources ( in collaboration with University of Konstanz (, Germany and funded by German Academic Exchange Service, DAAD (, Germany.

