Detection and Normalisation of the Temporal Expression in Hindi Text
|
International Journal of Computer Trends and Technology (IJCTT) | |
© 2017 by IJCTT Journal | ||
Volume-46 Number-2 |
||
Year of Publication : 2017 | ||
Authors : Charvee | ||
DOI : 10.14445/22312803/IJCTT-V46P115 |
Charvee "Detection and Normalisation of the Temporal Expression in Hindi Text". International Journal of Computer Trends and Technology (IJCTT) V46(2):73-79, April 2017. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract -
Temporal expressions are those expressions which convey some kind of temporal information i.e. related to time. These expressions can indicate a point in time such as “tomorrow 12 p.m.” or a period of time, e.g. “for first 7 months”. The task of recognizing temporal expressions from a chunk of text detects the temporal expressions and interprets them. Hence, it essentially consists of two sub tasks of detecting the temporal expressions and normalizing(interpreting) the temporal expressions. Interpretation of the temporal expressions is done in order to make them understandable to the computer algorithms. For Hindi language, the task of recognition has been achieved to some level but the research work related to the interpretation of the detected temporal expression is still in progress. The proposed work attempts to achieve both detection and normalization of temporal expression in texts written in Hindi language with approximately 78% accuracy. Both recognition and normalization make extensive use of the rule-based approach for the detection and interpretation tasks of the temporal entities in the text from news paper articles.
References
[1] Albat, Thomas Fritz. "Systems and Methods for Automatically Estimating a Translation Time." US Patent 0185235, 19 July 2012.
[2] Bar-Hillel, Yeheshua "A demonstration of the nonfeasibility of fully automatic high quality machine translation", Language and Information: Selected essays on their theory and application (Jerusalem Academic Press, 1964), pp. 174–179.
[3] Madsen, Mathias, "The Limits of Machine Translation (2010)". docs.google.com
[4] Speaker Independent Connected Speech Recognition- Fifth Generation Computer Corporation. Fifthgen.com. Retrieved 2013-06-15.
[5] Reynolds, Douglas; Rose, Richard). "Robust text-independent speaker identification using Gaussian mixture speaker models" , IEEE Transactions on Speech and Audio Processing (IEEE) 3 (1):72– 83.doi:10.1109/89.365379. ISSN 1063- 676. OCLC 26108901.
[6] Huttunen, S., Yangarber, R., Grishman, R, “Complexity of event structure in information extraction”. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). Taipei (2002)
[7] Jakub Piskorski and Roman Yangarber,” Information Extraction: Past, Present and Future”, Multi-source, Multilingual Information Extraction and Summarization 11, Theory and Applications of Natural Language Processing, pp. 23-49 DOI 10.1007/978-3-642-28569-1__2, © Springer- Verlag Berlin Heidelberg 2013,
[8] Andersen, P., Hayes, P., Huettner, A., Schmandt, L., Nirenburg, I., Weinstein, S.”Automatic extraction of facts from press releases to generate news stories”. In: Proceedings of the 3rd Conference on Applied Natural Language Processing, ANLC ?92, Trento, pp. 170–177.Association for Computational Linguistics, Stroudsburg (1992)
[9] Riloff, E.”Automatically constructing a dictionary for information extraction tasks”. In: Proceedings of Eleventh National Conference on Artificial Intelligence (AAAI-93), Washington, DC, pp. 811–816. AAAI/MIT (1993)
[10] Phillips,W., Riloff, E.,”Exploiting strong syntactic heuristics and co-training to learn semantic lexicons”. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002) (2002)
[11] Chinchor, N., “MUC-7 Named Entity Task Definition, version 3.5, 17”, Proceedings of the Seventh Message Understanding Conference (MUC- 7), Morgan Kaufmann Publishers, September 1997.
[12] Grishman,R., “ The NYU System for MUC-6 or Where`s the Syntax?”, In Proc. Sixth Message Understanding Conference (MUC-6), Columbia, MD, November 1995.
[13] Iwanska, L., Croll, M., Yoon, T., and Adams, M., “Wayne state university: Description of the UNO Natural Language Processing System as used for MUC-6”, In Proc. Sixth Message Understanding Conference (MUC-6), Columbia, Morgan-Kaufmann Publishers, 1995.
[14] Greenwood, M. A. and Gaizauskas, R., “Using a Named Entity Tagger to Generalize Surface Matching Text Patterns for Question Answering”, In EACL03: 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, 2003.
[15] Toral, A., Llopis, F., Munoz, R., and Noguera, E., “Reducing Question Answering Input Data using Named Entity Recognition”, In Proc. 8th International Conference on Text, Speech & Dialogue, 2005.
[16] Molla, D., Zaanen, M., and Smith, D., “Named Entity Recognition for Question Answering”, In Proc. ALTW 2006
[17] Babych, B., Hartley, A., and Atwell, E.,”Statistical Modelling of MT output corpora for Information Extraction”, In Proc. Corpus Linguistics conference, Lancaster University (UK), pp. 62-70, 28 - 31 March 2003.
[18] Tsai, R. T. H., Sung, C. H., Dai H. J., Hung, H. C., Sung,T. Y., and Hsu, W. L., “NERBio: Using Selected Word Conjunction, Term Normalization, and Global Patterns to Improve Biomedical Named Entity Recognition”, BMC Bioinformatics, 7(Suppl 5):S11, 2006.
[19] Zhou, G., Zhang, J., Su, J., Shen, D., and Tan, C., “Recognizing Names in Biomedical Texts: a Machine Learning Approach”, Bioinformatics, vol. 20, no. 7, pp. 1178- 1190, 2004.
[20] Sikdar, U. K., Ekbal, A., Saha, S., “Modified Differential Evolution for Biomedical Name Recognizer”, Computational Linguistics and Intelligent Text Processing, pp. 225-236, 2014.
[21] Lev Ratinov, Dan Roth, “Design Challenges and Misconceptions in Named Entity Recognition”, In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL), Association for Computational Linguistics, pp. 147–155, Boulder, Colorado, June 2009.
[22] David Nadeau, Santoshi Sekine,”A survey of named entity recognition and classification”, National Research Council, Canada/New York University, 2007
[23] Coates-Stephens, Sam. “The Analysis and Acquisition of Proper Names for the Understanding of Free Text” . Computers and the Humanities 26.441-456, San Francisco: Morgan Kaufmann Publishers. 1992
[24] Fleischman, Michael. “Automated Subcategorization of Named Entities” . In Proc. Conference of the European Chapter of Association for Computational Linguistic, 2001
[25] Cucerzan, Silviu; Yarowsky, D.”Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence”. In Proc. Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.1999.
[26] May, Jonathan; Brunstein, A.; Natarajan, P.; Weischedel, R. M. “Surprise! What?s in a Cebuano or Hindi Name?” ACM Transactions on Asian Language Information Processing 2:3.pp.169-180, New York: ACM Press, 2003.
[27] McCallum, Andrew; Li, W.”Early Results for Named Entity Recognition with Conditional Random Fields, Features Induction and Web-Enhanced Lexicons”. In Proc. Conference on Computational Natural Language Learning. 2003
[28] Ekbal, A. and Saha S.”Weighted vote-based classifier ensemble for named entity recognition: A genetic algorithmbased approach”. ACM Trans. Asian Lang. Info. Process. 10, 2 , 2011.
[29] Ferro, Lisa; Gerber, L.; Mani, I.; Sundheim, B.; Wilson G. “TIDES 2005 Standard for the Annotation of Temporal Expressions”. The MITRE Corporation.2005
[30] J. Hoffart, F.M. Suchanek, K. Berberich, and G.Weikum. “YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia”. Artificial Intelligence 194, pp28–61, 2013.
[31] M. Matthews, P. Tolchinsky, R. Blanco, J. Atserias, P. Mika, and H. Zaragoza. “Searching through time in the New York Time”.. In Proceedings of the HCIR’10 Workshop. pp.41-44 . 2010
[32] J. B. Michel, Y. K. Shen, A. P. Aiden, A. Veres, M. K. Gray, J. P. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, S. Pinker, M. A. Nowak, and E. L. Aiden. “Quantitative analysis of culture using millions of digitized books”. Science 331, 6014, pp. 176–182. 2011
[33] B. Kahle. “Preserving the internet”. Scientific American Magazine 276, 3, pp. 72–73. 1997
[34] A. Galton. “Time and change for AI”. In D. M. Gabbay, C. J. Hogger, and J. A. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Program-ming, volume IV, pp. 175-240. Oxford University Press.
[35] Mazur Pawel, “Broad-Coverage Rule-Based Processing of Temporal Expressions” PhD Thesis, Macquarie University, Centre for Language Technology, 2012
[36] I. Mani and G. Wilson.”Robust temporal processing of news” In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL), pp. 69-76, Hong Kong, October. Association for Computational Linguistics.2000
[37] Jannik Strotgen and Michael Gertz. “Heideltime: High quality rule-based extraction and normalization of temporal expressions”. In Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 321-324. Association for Computational Linguistics, 2010.
[38] Angel X Chang and Christopher D Manning. “Sutime: A library for recognizing and normalizing time expressions”. In LREC, pp. 3735-3740, 2012.
[39] Yu-Kai Lin, Hsinchun Chen, and Randall A Brown. “Medtime: A temporal information extraction system for clinical narratives”. Journal of biomedical informatics, 46:S20-S28, 2013.
[40] Ramrakhiyani, N. and Majumder, P. “Approaches to temporal expression recognition in Hindi”. ACM Trans. Asian Low- Resour. Lang. Inf. Process. 14, 1, Article 2 (January 2015), 22 pages. DOI:http://dx.doi.org/10.1145/2629574, 2015
[41] Hector Llorens, Leon Derczynski, Robert Gaizauskas, Estela Saquete. “TIMEN: An Open Temporal Expression Normalisation Resource” , LREC, page 3044- 3051. European Language Resources Association (ELRA), 2012
[42] https://docs.oracle.com/javase/7/docs/api/java/util/regex/pack age-summary.html
[43] www.joda.org/joda-time
[44] https://gate.ac.uk/download
[45] UzZaman, N., Llorens, H., Derczynski, L., Verhagen, M., Allen, J., and Pustejovsky, J. “SemEval Task 1: TempEval-3: Evaluating time expressions, events, and temporal relations”. In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval?13) in conjunction with the 2nd Joint Conference on Lexical and Computational Semantcis (* SEM ?13). Association for Computational Linguistics,June. 2013
[46] Verhagen, M., Gaizauskas, R., Schilder, F., Hepple, M., Moszkowicz, J., and Pustejovsky, J. “The TempEval challenge: Identifying temporal relations in text”. Lang. Resources Eval. (Special Issue on Computational Semantic Analysis of Language: SemEval-2007 and Beyond) 43, 2, pp. 161–179.2009
[47] Verhagen, M., Sauri, R., Caselli, T., and Pustejovsky, J. “SemEval-2010 task 13: TempEval-2”. In Proceedings of the 5th International Workshop on Semantic Evalua- tion. Association for Computational Linguistics, pp. 57–62.
[48] Palchowdhury, S., Majumder, P., Pal, D., Bandyopadhyay, A., and Mitra, M. “Overview of FIRE 2011”. In Multilingual Information Access in South Asian Languages, Springer, pp. 1–12. 2013
[49] Ramrakhiyani, N. and Majumder, P. “Temporal expression recognition in Hindi”. In Mining Intelligence and Knowledge Exploration. Springer, pp 740–750. 2013
[50] Venu Dave and et al.” Sentiment Analysis of Tourists Opinions of Amusement, Historical and Pilgrimage Places: A Machine Learning Approach” in International Journal of Computer Trends and Technology Volume 46 - No 2 April 2017.
Keywords
Temporal Expressions, Java-XML Binding, Natural Language Processing.