Semi Supervised Document Classification Model Using Artificial Neural Networks
|
International Journal of Computer Trends and Technology (IJCTT) | |
© 2016 by IJCTT Journal | ||
Volume-34 Number-1 |
||
Year of Publication : 2016 | ||
Authors : Dr.M.Karthikeyan | ||
DOI : 10.14445/22312803/IJCTT-V34P109 |
Dr.M.Karthikeyan "Semi Supervised Document Classification Model Using Artificial Neural Networks". International Journal of Computer Trends and Technology (IJCTT) V34(1):52-58, April 2016. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract -
Automatic document classification is of
paramount importance to knowledge management in
the information age. Document classification is a
kind of text data mining and organization technique
that automatically groups related documents into
clusters. Most of the common techniques in
document classification are based on the statistical
analysis of a term, either word or phrase. Statistical
analysis of a term frequency captures the
importance of the term within the document only.
However, two terms can have the same frequency in
their documents, but one term contributes more to
the meaning of its sentences than the other term. To
solve this problem the proposed system concentrates
on an interactive text clustering methodology, semi
supervised document classification method using
neural networks. There are two main phases in the
proposed method: Pre-processing phase and
Classification phase. In the pre-processing phase,
distinct words are identified and their frequency of
occurrences in the document corpus is calculated.
These discovered distinct words with their frequency
of occurrences, form a document vector. In the
classification phase, Back propagation algorithm is
used for document classification by using the feature
vector of distinct words. The proposed method
evaluates the system efficiency by implementing and
testing the clustering results with Dbscan and Kmeans
clustering algorithms. Experiment shows that
the proposed document clustering method performs
with an average efficiency of 92% for various
document categories.
References
[1] Yuen - Hsien Tseng, Generic title labeling for clustered
documents, Expert Systems with Applications, 37(2010)
2247-2254.
[2] Pei-Yi Hao, Jung - Hsien Chiang, Yi – Kun Tu,
Hierarchically SVM classification based on support
vector clustering method and its application to document
categorization, Expert Systems with Applications, 33(2007)
627-635.
[3] Ramiz M. Aliguliyev, Clustering of document
collection – A weighting approach, Expert Systems with
Applications, 36(2009) 7904-7916.
[4] Linghui Gong, Jianping Zeng, Shiyong Zhang, Text
stream clustering algorithm based on adaptive feature
selection, Expert Systems with Applications, 38(2011)
1393-1399.
[5] Ridvan Saracoglu, Kemal Tutuncu, Novruz Allahverdi, A
fuzzy clustering approach for finding similar documents
using a novel similarity measure, Expert Systems with
Applications, 33(2007) 600-605.
[6] Ridvan Saracoglu, Kemal Tutuncu, Novruz Allahverdi, A
new approach on search for similar documents with
multiple categories using fuzzy clustering, Expert Systems
with Applications, 34(2008) 2545-2554.
[7] Shih-Cheng Horng, Feng - Yi Yang, Shieh -Shing Lin,
Hierarchical fuzzy clustering decision tree for classifying
recipes of ion impanter, Expert Systems with Applications,
38(2011) 933-940.
[8] Hung Chim, Xiaotie Deng, Efficient Phrase –Based
Document Similarity for Clustering, IEEE Transactions on
Knowledge and Data Engineering, Vol 20,No.9(2008).
[9] Shady Shehta, Fakhri Karray, Mohamed S. Kamal, An
Efficient Concept-Based Mining Model for
Enhancing Text Clustering, IEEE Transactions on
Knowledge and Data Engineering, vol. 22, No.10, October
2010.
[10] Hung Chim, Xiaotie Deng, Efficient Phrase –Based
Document Similarity for Clustering, IEEE Transactions on
Knowledge and Data Engineering, Vol 20,No.9(2008).
[11] Alexander A. Frolov, Dusan Husek, Pavel Yu .Polyakov,
Recurrent-Neural – Network Based Boolean Factor
Analysis and Its Application to Word Clustering, IEEE
Transactions on Neural Networks, Vol 20,No.7(2009).
[12] Cheng Hua Li and Soon Cheol Park, Neural Network for
Text Classification Based on Singular Value
Decomposition, Seventh International Conference on
Computer and Information Technology, 0-7695-
2986-6/07, IEEE, 2007.
[13] Jie Ji, Kunita Daichi and Qiangfu, A Customer Intention
Aware System for Document Analysis, 978-1-4244-8126-
2/10,IEEE, 2010.
[14] Tommy W.S. Chow, M.K.M. Rahman, Multilayer SOM
with Tree-Structured Data for Efficient Document
Retrieval and Plagiarism Detection, IEEE Transactions on
Neural Networks, Vol 20, No.9, 2009.
[15] Zhonghui Feng, Junpeng Bao, Junyi Shen, Dynamic and
Adaptive Self Organizing Maps applied to High
Dimensional Large Scale Text Clustering, 978-1-4244-
6055-7/10, IEEE, 2010.
[16] Dino Isa, Rajprasad Rajkumar, Grham Kendall, Document
Zone Classification for Technicial Document Images Using
Artificial Neural Networks and Support Vector Machines,
978-1-4244-4457-1/09, IEEE, 2009.
[17] Hemalatha.M, Sathya Srinivas. D, Hybrid Neural Network
Model for Web Document Clustering, 978-1-4244-4457-
1/09, IEEE, 2009.
[18] M. Karthikeyan, P.Aruna, Probability Based document
clustering and Image clustering using Content Based
Image Retrieval, Applied Soft Computing, Vol 13, 959-
966,2013.
[19] Kantu. Vijaya Kumar, Abburi. Venkatesh, Multi-
Document summarization using phrase context based
Indexing and Geometric Model, International Journal of
Computer Trends and Technology (IJCTT) – volume 17
Number 5 Nov 2014.
[20] Mulluri Raghupathi, R. Lakshmi Tulasi, Hierarchical Filter
based Document Clustering Algorithm, International
Journal of Computer Trends and Technology (IJCTT) –
volume 21 Number 1 Nov 2014.
Keywords
Artificial Neural Network (ANN), Self
Organizing Map(SOM), Back Propagation Networks
(BPN), Term frequency, Tokenization, Structural
filtering.