Center for Language Engineering

 
 



 

 

KICS
KICS-UET


 
 

[ Projects ] [ Publications ] [ Activities ] [ Research Seminars ]

 
   
  Urdu Nastalique Optical Character Recognition System  
   
 
Project Details
Start date of project 18 October, 2010
Duration of project 1 Year
Funding agency PAN Localization
Principle investigator Dr. Sarmad Hussain
Project status (completed/in progress) In progress
Objectives
  1. To develop and mature algorithms for analyzing and recognizing Urdu text images using segmentation-based and ligature-based methods.
  2. To develop automatic scaling algorithms for Urdu ligatures to make font size independent system.
  3. To develop the Urdu OCR for Nastalique style of writing.
  4. To develop post-processing algorithms in computational linguistics for output generation and error correction of Urdu OCR.
Scope of work
  1. Urdu OCR will recognize the Urdu text written in Noori Nastalique writing style. Any text written using other writing styles will not be processed.
  2. The text from books written with font sizes ranging from 14 to 24 will be recognized. Smaller or larger font sizes will not be processed.
  3. This application will process plain text, and will not process advanced formatting, e.g. Italic, bold, and underline, etc.
  4. This application will not process figures and multi column text.
(Anticipated) Deliverables
  • Ligature based recognizer at 14 point size
  • Segmentation-based recognizer at 14 point size
  • Font size independent System for 14-24 point size
  • Ligature to word mapping system
 
     
     
 

webmaster@cle.org.pk