A Literature Review: Stemming Algorithms for Indian Languages

M.Thangarasu; Dr.R.Manavalan

doi:10.14445/22312803/IJCTT-V4I8P134

Research Article | Open Access | Download PDF

Volume 4 | Issue 8 | Year 2013 | Article Id. IJCTT-V4I8P134 | DOI : https://doi.org/10.14445/22312803/IJCTT-V4I8P134

A Literature Review: Stemming Algorithms for Indian Languages

M.Thangarasu, Dr.R.Manavalan

Citation :

M.Thangarasu, Dr.R.Manavalan, "A Literature Review: Stemming Algorithms for Indian Languages," International Journal of Computer Trends and Technology (IJCTT), vol. 4, no. 8, pp. 2582-2584, 2013. Crossref, https://doi.org/10.14445/22312803/IJCTT-V4I8P134

Abstract

Stemming is the process of extracting root word from the given inflection word. It also plays significant role in numerous application of Natural Language Processing (NLP). The stemming problem has addressed in many contexts and by researchers in many disciplines. This expository paper presents survey of some of the latest developments on stemming algorithms in data mining and also presents with some of the solutions for various Indian language stemming algorithms along with the results.

Keywords

Tamil morphology, Tamil stemmer, Light stemmer, Improved stemmer, Natural Language Processing.

References

[1] Alkula, R. From plain character strings to meaningful words: Producing better full text databases for inflectional and compounding languages with morphological analysis software. Information Retrieval, 4, (2001), 195-208.
[2] Krovetz, R. Viewing morphology as an inference process. In Proceedings of the Sixteenth Annual InternationalACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR’03) (Pittsburg, PA, 27 June – 1 July 1993). ACM Press, New York, NY, 1993, 191-202.
[3] Nilsson, M. Hierarchical clustering using non-greedy principal direction divisive partitioning. Information Retrieval, 5, 4 (2002), 311-321.
[4] Popovic, M., and Willett, P. The effectiveness of stemming for naturallanguage access to Slovene textual data. Journal of the American Society for Information Science, 43, 1 (1992), 384-390.
[5] Savoy, J. A stemming procedure and stopword list for general French corpora. Journal of the American Society for Information Science, 50, 10 (1999), 944-952.
[6] Kalamboukis, T. Z. Suffix stripping with modern Greek. Program, 29, 3 (1995), 313-321.
[7] Abu-Salem, H., Al-Omari, M., and Evens, M. W. Stemming methodologies over individual query words for an Arabic information retrieval system. Journal of the American Society for Information Science, 50, 6 (1999), 524-529.
[8] Rosell, M., Improving clustering of Swedish newspaper articles using stemming and compound splitting. In 14th Nordic Conference on Computational Linguistics (NoDaLiDa 2003). http://www.nada.kth.se/~rosell/publications/papers/improvingClustering 03.pdf
[9] Pirkola, A. Morphological typology of languages for information retrieval. Journal of Documentation, 57, 3 (2001), 330-348.
[10] Hull, D. Stemming algorithms: a case study for detailed evaluation. Journal of the American Society for Information Science, 47, 1 (1996), 70-84.
[11] Kannada Morphological Analyzer and Generator Using Trie paper.ijcsns.org/07_book
[12] A.Ramanathan and D.Rao, “A Lightweight Stemmer for Hindi ,” in proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics(EACL) on Computational linguistics for South Asian Language (Budapest, April) workshop, 2003.
[13] The Porter Stemming Algorithm: Then and Now - White Rose, eprints.whiterose.ac.uk, 1434/01 willettp9_PorterStemmingReview.pdf
[14] Khan. 2007. “A light weight stemmer for Bengali and its Use in spelling Checker,” Proc. 1st Intl. Conf. on Digital Comm. and Computer Applications (DCCA07), Irbid, Jordan, March 19-23.
[15] Assas-Band, an affix-exception-list based Urdu stemmer, dl.acm.org/citation.cfm.
[16] Hybrid Approach for Stemming in Punjabi - International Journal of Computer Science and Computer Network, www.ijcscn.com, ijcscn2013030206.pdf
[17] Malayalam Stemmer - Computational Linguistic Research Group, nlp.au- kbc.org, Malayalam Stemmer.
[18] Morphological Analyzer for Classical Tamil Texts: A Rule-based www.ti2012.infitt.org/sites/default/files/abstracts/35.pdf
[19] Mudassar M. Majgaonker et al. “Discovering suffixes: A Case Study for Marathi Language,” (IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 08, 2010, 2716-2720.
[20] Named Entity Recognition in Telugu Language using Language www.ijcaonline.org/volume22/number8/pxc3873628.pdf
[21] Juhi Ameta, Nisheeth Joshi and Iti Mathur, 2011, “A Lightweight Stemmer for Gujarati,” 46th Annual National Convention of Computer Society of India. Organized by Computer Society of India Gujarat Chapter. Sponsored by Computer Society of India and Department of Science and Technology, Govt. of Gujarat and IEEE Gujarat Section.
[22] MAULIK: An Effective Stemmer for Hindi Language - Engg Journals www.enggjournals.com/ijcse/doc/IJCSE12-04-05-213.pdf
[23] An iterative stemmer for Tamil language - ACM Digital Library dl.acm.org/citation.cfm?id=2247041