State of the art in Nastaleeq Script Recognition
Harmohan Sharma, Dharam Veer Sharma "State of the art in Nastaleeq Script Recognition". International Journal of Computer Trends and Technology (IJCTT) V39(1):40-46, September 2016. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract -
OCR of Nastaleeq script has gained a lot of importance during recent past owing to the requirements of preserving historic manuscripts and making such manuscripts searchable besides other applications of OCR. Nastaleeq, being a complex script, has largely remained untouched for automation till now. Whatever little work has been done so far, it has proved insufficient to fulfil the needs. Developing OCR for Urdu script based languages becomes even more complex than other languages like Latin and Chinese due to complexities of Urdu scripts, i.e. cursive nature of writing Urdu, context sensitive shapes, overlapping between ligatures, use of joiners, formation of ligatures within the words and space between the ligatures. Moreover, this paper analyses understanding of Urdu language, characteristics of Nastaleeq script and the complexities involved in developing the Urdu OCR.
References
[1] Gurpreet Singh Lehal, “A Word Segmentation System for
Handling Space Omission Problem in Urdu Script” in the
Proceedings of the 1st Workshop on South and Southeast
Asian Natural Language Processing (WSSANLP), the 23rd
International Conference on Computational Linguistics
(COLING), Beijing, pp 43–50, August 2010.
[2] M. Asad, A. S. Butt, S. Chaudhry and S. Hussain, “Rulebased
Expert System for Urdu Nastaleeq justification”, in
the Proceedings of 8th International Multitopic Conference
(INMIC 2004), pp 591–596, 2004.
[3] Prof (Dr) Syed M. Abdul Khair Kashfi, “Noori Nastaliq
Revolution in Urdu Composing”, Elite Publishers Limited,
D -118, SITE, Karachi, Pakistan, 2008.
[4] M. G. A. Malik, C. Boitet and P. Bhattacharyya, “Analysis
of Noori Nasta`leeq for Major Pakistani Languages”, in the
Proceedings of the 2nd Workshop on Spoken Language
Technologies for Under-resourced Languages (SLTU`2010),
Penang, Malaysia, pp 95-103, 2010.
[5] S. Mori, C. Y. Suen and K. Yamamoto, “Historical review
of OCR Research and Development”, in Proceedings of the
IEEE, vol 80, issue 7, pp 1029-1058, 1992.
[6] G. Nagy, “Chinese Character Recognition - A twenty five
years retrospective”, in Proceedings of the ICPR, pp 109 -
114, 1988.
[7] Atallah Mahmoud AL-Shatnawi, Safwan AL-Salaimeh,
Farah Hanna AL-Zawaideh and Khairuddin Omar, “Offline
Arabic Text Recognition – An Overview”, in World of
Computer Science and Information Technology Journal
(WCSIT), vol 1(5), pp 184-192, 2011.
[8] B. B. Chaudhuri, U. Pal and M. Mitra, “Automatic
Recognition of Printed Oriya Script”, Sadhana, vol 27, part
1, pp 23-34, 2002.
[9] B. B. Chaudhuri and U. Pal, “A Complete Printed Bangla
OCR System”, in Pattern Recognition, vol 31, pp 531-549,
1998.
[10] Md. Mahbub Alam and Dr. M. Abul Kashem, “A Complete
Bangla OCR System for Printed Characters”, in JCIT, vol
1, issue 01, pp 30-35, 2010.
[11] U. Pal and B. B. Chaudhuri, “Printed Devnagari Script OCR
System”, Vivek, vol 10, pp 12-24, 1997.
[12] Vikas J. Dongre and Vijay H. Mankar, “A Review of
Research on Devnagari Character Recognition”, in the
International Journal of Computer Applications, vol 12(2),
pp 8 -15, 2010.
[13] G S Lehal and Chandan Singh, “A Gurmukhi Script
Recognition System”, in Proceedings of the 15th
International Conference on Pattern Recognition, vol 2, pp
557- 560, 2000.
[14] A. Negi, C. Bhagvati and B. Krishna, “An OCR System for
Telugu”, in the Proceedings of 6th ICDAR, pp 1110 - 1114,
2001.
[15] G. Sirmony, R Chandrasekaran and M. Chandrasekaran,
“Computer Recognition of Printed Tamil Charcters”, in
Pattern Recognition, vol 10, issue 4, pp 243-247, 1978.
[16] Saeeda Naz, Khizar Hayat, Muhammad Imran Razzak,
Muhammad Waqas Anwar, Sajjad A. Madani and Samee U.
Khan, “The Optical Character Recognition of Urdu-like
Cursive Scripts”, in Pattern Recognition, vol. 47, Issue 3, pp
1229–1248, 2014.
[17] Farah Adeeba, “Urdu 2430 Most Frequently Used
Ligatures” Center for Language Engineering Al-Khwarizmi
Institute of Computer Science University of Engineering
and Technology Lahore, Pakistan
http://www.cle.org.pk/software/ling_resources/UrduHighFre
qLigature.htm).
[18] Malik Waqas Sagheer, Chun Lei He, Nicola Nobile and
Ching Y. Suen, “A New Large Urdu Database for Off-Line
Handwriting Recognition”, in Image Analysis and
Processing (ICIAP 2009) vol 5716, pp 538–546, 2009.
[19] Muhammad Imran Razzak, Syed Afaq Husain,
Abdulrahman A. Mirza and Abdel Belaïd, “Fuzzy Based
Preprocessing using Fusion of Online and Offline trait for
Online Urdu Script based languages Character
Recognition”, in International Journal of Innovative
Computing, Information and Control, vol 8, number (5(A)),
pp 3149–3161, 2012.
[20] U. Pal and A. Sarkar, “Recognition of Printed Urdu Script”,
in Proceedings of the Seventh International Conference on
Document Analysis and Recognition (ICDAR), pp 1183-
1187, 2003.
[21] Aamir Wali, Atif Gulzar, Ayesha Zia, Muhammad Ahmad
Ghazali, Muhammad Irfan Rafiq, Muhammad Saqib Niaz,
Sara Hussain, and Sheraz Bashir “Contextual Shape
Analysis of Nastaleeq”, CRULP Annual Student Report, pp
288-302, 2001-2002.
[22] Qurat ul Ain Akram, Sarmad Hussain and Zulfiqar Habib,
“Font Size Independent OCR for Noori Nastaleeq” in the
Proceedings of Graduate Colloquium on Computer
Sciences, Department of Computer Science, FAST-NU
Lahore, vol 1, 2010
[23] Sohail A. Sattar,Shamsul Haque, Mahmod K. Pathan and
Quintin Gee, “Implementation Challenges for Nastaliq
Character Recognition”, in Wireless Networks, Information
Processing and Systems, ser. Communications in Computer
and Information Science, vol 20, Springer, Berlin,
Heidelberg, pp 279-285, 2009.
[24] S. A. Sattar, “A Technique for the Design and
Implementation of an OCR for Printed Nastalique Text”
(Ph.D. dissertation), NED University of Engineering &
Technology, Karachi, Pakistan, 2009.
[25] Sohail Abdul Sattar, Shams-ul Haque and Mahmood Khan
Pathan, “A Finite State Model for Urdu Nastalique Optical
Character Recognition”, in International Journal of
Computer Science and Network Security (IJCSNS) vol 9(9),
2009.
[26] Gurpreet Singh Lehal, “Ligature Segmentation for Urdu
OCR,” in 12th International Conference on Document
Analysis and Recognition (ICDAR), pp 1130 -1134, 2013.
[27] Gurpreet Singh Lehal, “Choice of Recognizable Units for
URDU OCR,” in Proceeding of the workshop on Document
Analysis and Recognition (DAR), pp 79-85, 2012.
[28] Gurpreet Singh Lehal and Ankur Rana, “Recognition of
Nastalique Urdu Ligatures”, in Proceedings of the 4th
International Workshop on Multilingual OCR, USA, 2013.
[29] Safia Shabbir and Imran Siddiqi, “Optical Character
Recognition System for Urdu Words in Nastaliq Font”, in
International Journal of Advanced Computer Science and
Applications (IJACSA), vol 7, No. 5, pp 567-576, 2016.
[30] S. A. Husain, “A Multi-tier Holistic approach for Urdu
Nastaliq Recognition”, International Multitopic Conference
INMIC, Karachi, 2002,
[31] S. A. Husain, Asma Sajjad and Fareeha Anwar, “Online
Urdu Character Recognition System”, in the IAPR
Conference on Machine Vision Applications, Tokyo, Japan,
pp 98-102, 2007.
[32] Ihtesham Haider and Kamran Ullah Khan, “Online
Recognition of Single Stroke Handwritten Urdu
Characters”, in Proceedings of the 13th International Multi
topic IEEE Conference (INMIC`09) , pp 1–6, 2009.
[33] Israr Uddin Khattak, Imran Siddiqi, Shehzad Khalid and
Chawki Djeddi, “Recognition of Urdu Ligatures - A Holistic
Approach”, in 13th International Conference on Document
Analysis and Recognition (ICDAR), pp 71-75, 2015.
[34] Sobia T. Javed, Sarmad Hussain, Ameera Maqbool, Samia
Asloob, Sehrish Jamil and Huma Moin, “Segmentation Free
Nastalique Urdu OCR”, World Academy of Science,
Engineering and Technology, issue 70, pp 457-462, 2010.
Keywords
Optical Character Recognition, Nastaleeq, Ligature recognition.