Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Privacy Preserving Probabilistic Possibilistic Fuzzy C Means Clustering

1V.S. Thiyagarajan and 2Venkatachalapathy
1Annamalai University, Chidhambaram, India
2Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Chidhambaram, India
Research Journal of Applied Sciences, Engineering and Technology  2015  1:27-39
http://dx.doi.org/10.19026/rjaset.11.1672  |  © The Author(s) 2015
Received: December ?14, ?2014  |  Accepted: April ?13, ?2015  |  Published: September 05, 2015

Abstract

Due to this uncontrollable growth of data, clustering played major role to partition into a small sets to do relevant processes within the small sets. Recently, the privacy and security are extra vital essentials when data is large and the data is distributed to other sources for various purposes. According to that, the privacy preservation should be done before distributing the data. In this study, our proposed algorithm meets the both requirements of achieving the clustering accuracy and privacy preserving of the data. Initially, the whole dataset is divided to small segments. The next step is to find the best sets of attributes combinations, which are attained through, attribute weighing process, which leads to attain the privacy preservation through vertical partitioning. The next is to apply the proposed Probabilistic Possibilistic Clustering Algorithm (PPFCM) for each segment, which produces the number of clusters for each segment. The next step is applying the PPFCM on the centroids of the clusters. The corresponding data tuples of the grouped centroids join to attain the final clustered result. The implementation is done using JAVA and the performance of the proposed PPFCM algorithm is compared with possibilistic FCM and probability-clustering algorithm for the benchmark datasets.

Keywords:

Clustering, possibilistic fuzzy C means clustering, privacy preserving, probabilistic clustering,


References

  1. Adult dataset, 1994. Retrieved from: http://archive.ics.uci.edu/ml/datasets/Adult.
    Direct Link
  2. Chen, W.Y., Y. Song, H. Bai, C.J. Lin and E.Y. Chang, 2011. Parallel spectral clustering in distributed systems. IEEE T. Pattern Anal., 33(3): 568-586.
    CrossRef    PMid:20421667    
  3. Das, S., A. Abraham and A. Konar, 2008. Automatic clustering using an improved differential evolution algorithm. IEEE T. Syst. Man Cy. A, 38(1): 218-237.
    CrossRef    
  4. Ester, M., H.P. Kriegel, J. Sander and X. Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceeding of 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96 ), pp: 226-231.
  5. Islam, M.Z. and L. Brankovic, 2004. A framework for privacy preserving classification in data mining. Proceeding of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence and Software Internationalisation, 32: 163-168.
  6. Izakian, H., A. Abraham and V. Snasel, 2009. Fuzzy clustering using hybrid fuzzy C-means and fuzzy particle swarm optimization. Proceeding of World Congress on Nature and Biologically Inspired Computing. IEEE Press, India, pp: 1690-1694.
    CrossRef    
  7. Jain, Y.K., V.K. Yadav and G. Panday, 2011. An efficient association rule hiding algorithm for privacy preserving data mining. Int. J. Comput. Sci. Eng., 3(7): 2792-2798.
  8. Januzaj, E., H.P. Kriegel and M. Pfeifle, 2004. Scalable density-based distributed clustering. Proceeding of 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pp: 231-244.
    CrossRef    
  9. Ji, J., W. Pang, C. Zhou, X. Han and Z. Wang, 2012. A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowl-Based Syst., 30: 129-135.
    CrossRef    
  10. Jin, R., A. Goswami and G. Agrawal, 2006. Fast and exact out-of-core and distributed K-means clustering. Knowl. Inf. Syst., 10(1): 17-40.
    CrossRef    
  11. Kusiak, A. and M. Smith, 2007. Data mining in design of products and production systems. IFAC Annu. Rev. Control, 31(1): 147-156.
    CrossRef    
  12. Li, T., N. Li, J. Zhang and I. Molloy, 2012. Slicing: A new approach for privacy preserving data publishing. IEEE T. Knowl. Data En., 24(3): 561-574.
    CrossRef    
  13. Lyigun, C., 2008. Probabilistic Distance Clustering, Proquest, ISBN: 0549980075, 9780549980070.
  14. Mehmed, K., 2003. Data Mining: Concepts, Models, Methods and Algorithms. John Wiley and Sons, Hoboken, N.J.
  15. Mushroom dataset, 1981. Retrieved from: http://archive.ics.uci.edu/ml/datasets/Mushroom.
    Direct Link
  16. Ng, R.T. and J. Han, 1994. Efficient and effective clustering methods for spatial data mining. Proceeding of the 20th International Conference on Very Large Data Bases, pp: 144-155.
  17. Osmar, R.Z., 1999. Introduction to Data Mining. In: Principles of Knowledge Discovery in Databases. CMPUT690, University of Alberta, Canada.
  18. Pal, N.R., K. Pal, J.M. Keller and J.C. Bezdek, 2005. A possibilistic fuzzy c-means clustering algorithm. IEEE T. Fuzzy Syst., 13(4): 517-530.
    CrossRef    
  19. Patel, S., V. Patel and D. Jinwala, 2013. Privacy preserving distributed K-means clustering in malicious model using zero knowledge proof. In: Hota, C. and P.K. Srimani (Eds.), ICDCIT, 2013. LNCS 7753, Springer-Verlag, Berlin, Heidelberg, pp: 420-431.
  20. Roy, B., 2014. Performance analysis of clustering in privacy preserving data mining. Int. J. Comput. Appl. Inform. Technol., 5(2): 35-45.
  21. Sheikholeslami, G., S. Chatterjee and A. Zhang, 1998. WaveCluster: A multi-resolution clustering approach for very large spatial databases. Proceeding of the 24th VLDB Conferences. New York, USA, pp: 428-439.
  22. Wang, W., J. Yang and R. Muntz, 1997. STING: A statistical information grid approach to spatial data mining. Proceeding of the 23rd International Conference on Very Large Data Bases (VLDB), pp: 186-195.
  23. Wehrens, R. and L.M. Buydens, 2004. Model-based clustering for image segmentation and large datasets via sampling. J. Classif., 21: 231-253.
    CrossRef    
  24. Wu, S. and S. Wang, 2013. Information-theoretic outlier detection for large-scale categorical data. IEEE T. Knowl. Data En., 25(3): 589-602.
    CrossRef    
  25. Zhang, T., R. Ramakrishnan and M. Livny, 1996a. BIRCH: An efficient data clustering method for very large databases. Proceeding of the ACM SIGMOD International Conference on Management of Data, pp: 103-114.
    CrossRef    
  26. Zhang, T., R. Ramakrishnan and M.L. Birch, 1996b. An efficient data clustering method for very large databases. Proceeding of the ACM SIGMOD International Conference on Management of Data, pp: 103-114.

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved