Dynamic Dispatch Cluster Ensemble Approach for Mixed Attributes Dataset
Waale Angela Gboraloo, Chidiebere Ugwu "Dynamic Dispatch Cluster Ensemble Approach for Mixed Attributes Dataset". International Journal of Computer Trends and Technology (IJCTT) V48(2):96-102, June 2017. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract -
In recent time, data is growing binomially in almost all organizations in the world such as schools, hospitals, banks, which are usually of mixed attribute data values with numerical or categorical attribute data type. Several clustering systems with various clustering algorithms has been proposed to discover useful patterns that exist in such datasets, all adopting the same approach of splitting the dataset into two fragmented files and storing them on the storage device before subjecting them to clustering algorithms. This approach slows down the clustering process when there is large dataset. This paper presents a new dynamic dispatch cluster ensemble approach to clustering mixed attribute dataset based on ensemble technique where the attribute data type is automatically detected at run-time in place of outright splitting of the dataset into two subsets before clustering. The system utilized k means and Squeezer algorithms for clustering the various datasets. Object oriented design and Java programming language were used in the system development and implementation. The system was experimented on real life dataset obtained from UCL machine learning repository and results obtained were significantly different when compared to existing clustering systems. The process time was faster than the old systems because of the implicit and not explicit approach adopted in the system designs.
References
[1] Abraham Silberschatz, Henry F. Korth and Suders S. Han,Database system Concepts, fifth edition, McGraw Hill international, 2006.
[2] AsadiSrinivasulu, Ch.D.V.SubbaRao, C. Kishore and Shreyash Raju, Clustering the Mixed Numerical and Categorical Datasets using Similarity Weight and Filter Method, International Journal of Computer Science, Information Technology and Management vol.1 No.1-2,2012.
[3] M. A. Honorine, M. Sowjanya and O. Mrudula,Cluster Ensemble Approach for Clustering Mixed Data. International Journal of Computer Techniques, vol.2,No.5, p.43-50,2015.
[4] ZhexueHuang,Clustering Large Datasets with Mixed Numerical and Categorical Values, 1997.
[5] G. Karypis, E. H. Han, and V. Kumar, CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling, http://www.lsi.upc.edu/~bejar/amlt/material_art/DMclustering karypis99chameleon.pdf.
[6] J.MacQueen, Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings for the 5th Berkeley Symposium on Mathematical Statistics and Probability, p. 281-297, 1967.
[7] Milton Scott, Heinz W. Schmidt,Dynamic Dispatch in Object-Oriented Languages (Technical report). TR-CS-94-02. Australian National University. CiteSeerX: 10.1.1.33.4292. 1994.
[8] Ming-Yi Shih, Jar_Wen Jheng and Lien-Fu Lai, A Two-Step method for Clustering Mixed Categorical and Numeric Data, Tamkang Journal of Science and Engineering, vol.13,No.1,2010.
[9] Rafsanjani Kuchaki M., Varzaneh Asghari Z., Chukanlo Emami N., A Survey of hierarchical Clustering Algorithms, The Journal of Mathematics and Computer Science vol. 5, No. 3. 2012.
[10] Reddy M. V. Jagannatha and Kavitha B., Clustering the Mixed Numerical and Categorical Dataset using Similarity Weight and filtered Method, International Journal of Database Theory and Application vol.5, No.1, 2012.
[11] Oded Maimon and Lior Rokach,Data Mining with Decision Trees: Theory and Applications, World Scientific Publishing Co. Pte Ltd, 2007.
[12] Teknomo Kardi, K-means Clustering Tutorials. http://people.revoledu.com/kardi/tutorial/kMean/NumericalExample.htm
[13] Prajapati Madhavi and Dhobi,Clustering Method for Mixed Categorical and Numerical data, IJARIE vol. 2 No.3.2016.
[14]S. K. Singh, Database Systems: Concept, Design and Applications, Dorling Kindersley (India) Pvt. Ltd., Pearson.2006.
[15] Shi-Hua Liu, Liang-Zhong Shen and De-cai Huang, A Three-stage framework for clustering mixed data, WSEAS TRANSACTION on SYSTEMS E-ISSN 2224-2678, vol. 15, 2016.
[16] Sugana J. and Selvi Arul M,Ensemble Fuzzy Clustering for Mixed Numeric and Categorical Data, International Journal of Computer Application vol.42, No.3, 2012.
[17] Zengyou He, Xu X and S. Deng,Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach, Arxiv preprint cs/0509011,2005.
[18] ZengyouHe, Xu Xiaofei and Deng Shengchun,Squeezer, An Efficient Algorithm for Clustering Categorical data, J. Comput. Sci. & Technol. Vol.17, No.15,p 1-14, 2002.
[19] A. Topchy, A. Jain, and W. Punch. A mixture modelfor clustering ensembles, inSDM, 2004.
[20] S.Sarumathi, N.Shanthi, G.Santhiya, A Survey of Cluster Ensemble,International Journal of Computer Applications,Vol.65, No.9, 2013.
[21] Nisha Rani, and Yamini Chouhan. Combining and Analyzing Apriori and K-Means Algorithms for Efficient Data Mining on the Web,International Journal of Computer Trends and Technology (IJCTT) Vol.23, No.1, 31-34, published by Seventh Sense Research Group, 2015.
[22] M. Karthikeyan, Semi Supervised Document Classification Model Using Artificial Neural Networks, International Journal of Computer Trends and Technology (IJCTT) vol.34, No.1, 52-58, published by Seventh Sense Research Group, 2016.
[23] K. Kavitha, Pertaining the Concept of Risk Evaluation and Prediction for Multi-Dimensional Clustering, International Journal of Computer Trends and Technology (IJCTT) Vol.32 No.1, 14-16, published by Seventh Sense Research Group, 2016.
Keywords
Mixed Attributes Dataset, Clustering, Data Mining, Dynamic Dispatch and Cluster Ensemble.