Center for Language Engineering






[ Text Corpora ] [ Image Corpora ] [ Lexical Resources ] [ NLP Applications ]


[ How to Order ]


CLE is making these linguistic resources available without cost for supporting academic, non-commercial research. The processing fees being charged will be used to maintain these resources. You are requested to contact CLE directly for any discounts (applicable only for selective public organizations in Pakistan) or for commercial licensing options.

  CLE Urdu Word Segmentation System [ Pakistan ] [ International ]
CLE Catalog #: CLE14A001
Release Date: 12 November 2014
Language(s): Urdu
Application Type: API
Platform: JAVA
Distribution: Web Download
Processing Fee (Pakistan): 15000 PKR
Processing Fee (International): 250 USD
License: Yes
  While typing Urdu text, words are not consistently separated using space. Thus the typed Urdu text may have space insertion and space deletion issues. The CLE Urdu Word Segmentation System takes typed Urdu text as input and generates space separated sequence of words with 97.9% accuracy. The system is statistically trained using 37 million words corpus. For more details see:
  1. Word Segmentation for Urdu OCR
  2. Urdu Word Segmentation
  3. CLE Urdu Books N-grams
  The minimum hardware requirements for this application are: Pentium-compatible CPU 2.8 GHz and 8 GB RAM. This application requires Windows XP, Windows Vista or Windows 7 platform with Java Runtime Environment 7.0.
  The package of CLE Urdu Word Segmentation System contains:
  1. CLE Urdu Word Segmentation API
  2. CLE Urdu Word Segmentation API - Release Notes
   د نیا کا ہر فر د کا میا بی کا آ ر ز و مند ہے۔ نا کا می سے سب گھبر ا تے ہیں۔ عز ت، دو لت، ر ا حت ا و ر عا فیت کی ز ندگی کے سبھی شید ا ئی ہیں۔

: ان پٹ

   دنیا کا ہر فرد کا میا بی کا آرزو مند ہے۔ نا کامی سے سب گھبرا تے ہیں۔ عزت، دو لت، ر احت اور عافیت کی زندگی کے سبھی شیدائی ہیں۔

: آؤٹ پٹ

  Online Urdu Word Segmentation System