Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Performance Comparison of Clustering Techniques

Sambourou Massinanke and Lu Zhimao
College of Information and Communication Engineering, Harbin Engineering University, Harbin, China
Research Journal of Applied Sciences, Engineering and Technology  2014  5:963-969
http://dx.doi.org/10.19026/rjaset.7.342  |  © The Author(s) 2014
Received: January 29, 2013  |  Accepted: March 14, 2013  |  Published: February 05, 2014

Abstract

Data mining consists to extracting or “mining” information from large quantity of data. Clustering is one of the most significant research areas in the domain of data mining. Clustering signifies making groups of objects founded on their features where the objects of the same groups are similar and those belonging in different groups are not similar. This study reviews two Clustering Algorithms of the representative clustering techniques: K-modes and K-medoids algorithms. The two algorithms are experimented and evaluated on partitioning Y-STR data. All these algorithms are compared according to the following factors: certain number times of run, precision and recall. The global results show that K-mode clustering is better than the k-medoid in clustering Y-STR data.

Keywords:

Data clustering, , k-medoids clustering and data of Y-STR, k-modes clustering,


References

  1. Ahmad, A. and L. Dey, 2007. A k-mean clustering algorithm for mixed numeric and categorical data'. Data Knowl. Eng., 63: 503-527.
    CrossRef    
  2. Aggarwal, C.C., C.S. Gates and P.S. Yu, 1999. On the merits of building categorization systems by supervised clustering. Proceedings of the 5th Conference on ACM Special Interest Group on Knowledge Discovery and Data Mining, August 15-18, 1999, San Diego, California, USA, pp: 352-356.
    CrossRef    
  3. Chaturvedi, A., P. Green and J. Carroll, 2001. K-modes clustering. J. Classification, 18: 35-55.
    CrossRef    
  4. Chu, S.C., J.F. Roddick and J.S. Pan, 2002. An efficient k-medoids-based algorithm using previous medoid index, triangular inequality elimination criteria and partial distance search. Proceeding of the International Conference on Data Warehousing and Knowledge Discovery (DaWaK), London, UK, pp: 63-72.
    CrossRef    
  5. Ester, M., H.P. Kriegel and X. Xu, 1995. Knowledge discovery in large spatial databases: focusing techniques for efficient class identification. Proceeding of the International Symposium on Advances in Spatial Databases, Portland, ME, 951: 67-82.
    CrossRef    
  6. Fitzpatrick, C., 2005. Forensic Genealogy. Rice Book Press, Fountain Valley, CA.
  7. Fitzpatrick, C. and A. Yeiser, 2005. DNA and Genealogy. Rice Book Press, Fountain Valley, CA.
  8. Gan, G., Z. Yang and J. Wu, 2005. A genetic k-modes algorithm for clustering categorical data. Lect. Notes Artif. Intell., 3584(2005): 195-202.
    CrossRef    
  9. Gowda, K.C. and E. Diday, 1991. Symbolic clustering using a new dissimilarity measure. Pattern Recogn. Lett., 24(6): 567-578.
    CrossRef    
  10. Hartigan, J. and M. Wong, 1979. Algorithm as136: A k-means clustering algorithm. Appl. Stat., 28: 100-108.
    CrossRef    
  11. Haung, J.Z., M.K. Ng, H. Rong and Z. Li, 2005. Automated variable weighting in k-mean type clustering. IEEE T. PAMI, 27(5).
  12. He, Z., S. Deng and X. Xu, 2005. Improving k-modes algorithm considering frequencies of attribute values in mode. Lect. Notes Artif. Intell., 3801(2005): 157-162.
    CrossRef    
  13. Huang, Z., 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discovery, 2(1998): 283-304.
    CrossRef    
  14. Huang, Z., 2003. A note on k-modes clustering. J. Classification, 20: 257-26.
    CrossRef    
  15. Huang, Z. and M.K. Ng, 1999. A fuzzy k-modes algorithm for clustering categorical data. IEEE T. Fuzzy Syst., 7(4): 446-452.
    CrossRef    
  16. Jain, A.K., M.N. Murty and P.J. Flynn, 1999. Data clustering: A review. ACM Comput. Surveys, 31: 264-323, DOI: 10.1145/331499.331504.
    CrossRef    
  17. Jain, A.K. and R.C. Dubes, 1988. Algorithms for Clustering Data. Prentice Hall Inc., Englewood Cliffs, New Jersey, pp: 320.
  18. Kaufman, L. and P.J. Rousseeuw, 2005. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, NY.
    CrossRef    
  19. Kaufmann, L. and P.J. Rousseeuw, 1990. Finding Group in Data: An Introduction to Cluster Analysis. John Willey and Sons, NY.
    CrossRef    
  20. Kim, D.W., K.H. Lee and D. Lee, 2004. Fuzzy clustering of categorical data using fuzzy centroids. Pattern Recogn. Lett., 25(11): 1263-1271.
    CrossRef    
  21. Kim, D.W., K.Y. Lee, D. Lee and K.H. Lee, 2005. A k populations algorithm for clustering categorical data. Pattern Recogn., 38(7): 1131-1134.
    CrossRef    
  22. Kowalski, G., 1997. Information Retrieval Systems: Theory and Implementation. 3rd Edn., Kluwer Academic Publishers, USA, pp: 296.
  23. Krishna, K. and M. Murty, 1999. Genetic k-means algorithm'. IEEE T. Syst. Man Cy., 29(3): 433-439.
    CrossRef    PMid:18252317    
  24. Ng, M.K. and J.C. Wong, 2002. Clustering categorical data sets using tabu search techniques. Pattern Recogn., 35(12): 2783-2790.
    CrossRef    
  25. Ralambondrainy, H., 1995. A conceptual version of the K-Means algorithm. Pattern Recogn. Lett., 16: 1147-1157.
    CrossRef    
  26. San, O.M., V.N. Huynh and Y. Nakamori, 2004. An alternative extension of the k-means algorithm for clustering categorical data. Int. J. Appl. Math. Comput. Sci., 14(2): 241-247.
  27. Sun, Y., Q. Zhu and Z. Chen, 2002. An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recogn. Lett., 23(7): 875-884.
    CrossRef    
  28. Van Rijsbergen, C.J., 1989. Information Retrieval. 2nd Edn., Buttersworth Publishers, London, UK, pp: 323.
  29. Zhang, Q. and I. Couloigner, 2005. A new and efficient k-medoid algorithm for spatial clustering. Proceeding of the International Conference on Computational Science and Its Applications, Singapore, 3482 of LNCS: 181-189.
    CrossRef    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved