Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Unsupervised Discretization: An Analysis of Classification Approaches for Clinical Datasets

1M. Shanmugapriya, 1H.Khanna Nehemiah, 1R.S. Bhuvaneswaran, 2Kannan Arputharaj and 1J. Jabez Christopher
1Ramanujan Computing Centre
2Department of Information Science and Technology, Anna University, Chennai-600025, India
Research Journal of Applied Sciences, Engineering and Technology  2017  2:67-72
http://dx.doi.org/10.19026/rjaset.14.3991  |  © The Author(s) 2017
Received: June ‎28, ‎2016  |  Accepted: August ‎9, ‎2016  |  Published: February 15, 2017

Abstract

Discretization is a frequently used data preprocessing technique for enhancing the performance of data mining tasks in knowledge discovery from clinical data. It is used to transform the real-world quantitative data into qualitative data. The aim of this study is to present an experimental analysis of the variation in performance of two trivial unsupervised discretization methods with respect to different classification approaches. Equal width discretization and equal frequency discretization methods are applied for four benchmark clinical datasets obtained from the University of California, Irvine, machine learning repository. Both the methods were applied for transforming quantitative attributes into qualitative attributes with three, five, seven and ten intervals. Six classification approaches were evaluated using four evaluation measures. From the results of this experimental analysis, it can be observed that there is a variation in the performance of classification algorithms. Accuracy of classification varies with respect to the discretization method used and also with respect to the number of intervals of discretization. Moreover it can be inferred that different classification approaches require different discretization methods. No method can be deemed to be ‘the best-suitable’ for all applications; hence the choice of an appropriate discretization method depends on data distribution, data interpretability, correlation, classification performance and domain of application.

Keywords:

Classification, clinical knowledge-mining, equal frequency discretization, equal width discretization, qualitative data, quantitative data,


References

  1. Agrawal, R. and R. Srikant, 1994. Fast algorithms for mining association rules. Proceeding of the 20th International Conference on Very Large Databases. Santiago, Chile, pp: 487-499.
  2. Boser, B.E., I.M. Guyon and V.N. Vapnik, 1992. A training algorithm for optimal margin classifiers. Proceeding of the 5th Annual Workshop on Computational Learning Theory, pp: 144-152.
    CrossRef    
  3. Fu, T.C., 2011. A review on time series data mining. Eng. Appl. Artif. Intel., 24(1): 164-181.
    CrossRef    
  4. Quinlan, J.R., 1986. Induction of decision trees. Mach. Learn., 1(1): 81-106.
    CrossRef    
  5. Rosenblatt, F., 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev., 65(6): 386-408.
    CrossRef    PMid:13602029    
  6. Susmi, S.J., H.K. Nehemiah, A. Kannan and J.J. Christopher, 2015. A hybrid classifier for leukemia gene expression data. Res. J. Appl. Sci. Eng. Technol., 10(2): 197-205.
  7. Christopher, J.J., H.K. Nehemiah and A. Kannan, 2015. A clinical decision support system for diagnosis of allergic rhinitis based on intradermal skin tests. Comput. Biol. Med., 65: 76-84.
    CrossRef    PMid:26298488    
  8. Fayyad, U., G. Piatetsky-Shapiro and P. Smyth, 1996. Knowledge discovery and data mining: Towards a unifying framework. Proceeding of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96), pp: 82-88.
    Direct Link
  9. Han, J. and M. Kamber, 2006. Data Mining: Concepts and Techniques. 2nd Edn., Morgan Kaufmann, San Francisco, CA, USA.
    PMCid:PMC3769573    
  10. Jane, N.Y., K.H. Nehemiah and K. Arputharaj, 2016. A temporal mining framework for classifying un-evenly spaced clinical data: An approach for building effective clinical decision-making system. Appl. Clin. Inform., 7(1): 1-21.
    CrossRef    PMid:27081403 PMCid:PMC4817331    Direct Link
  11. Kohavi, R., 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceeding of the 14th International Joint Conference on Artificial Intelligence (IJCAI'95), 14: 1137-1143.
    Direct Link
  12. Kohavi, R. and M. Sahami, 1996. Error-based and entropy-based discretization of continuous features. Proceeding of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96), pp: 114-119.
    Direct Link
  13. Liu, B., W. Hsu and Y. Ma, 1998. Integrating classification and association rule mining. Proceeding of the 4th International Conference on Knowledge Discovery and Data Mining, pp: 80-86.
    Direct Link
  14. Liu, H., F. Hussain, C.L. Tan and M. Dash, 2002. Discretization: An enabling technique. Data Min. Knowl. Disc., 6(4): 393-423.
    CrossRef    
  15. Maslove, D.M., T. Podchiyska and H.J. Lowe, 2013. Discretization of continuous features in clinical datasets. J. Am. Med. Inform. Assn., 20(3): 544-553.
    CrossRef    PMid:23059731 PMCid:PMC3628044    
  16. Mittal, A. and L.F. Cheong, 2002. Employing discrete bayes error rate for discretization and feature selection tasks. Proceeding of the IEEE International Conference on Data Mining (ICDM-2002), pp: 298-305.
    CrossRef    
  17. Nahato, K.B., K.N. Harichandran and K. Arputharaj, 2015. Knowledge mining from clinical datasets using rough sets and backpropagation neural network. Comput. Math. Method. M., 2015: 1-13.
    CrossRef    PMid:25821508 PMCid:PMC4364360    
  18. Richeldi, M. and M. Rossotto, 1995. Class-driven Statistical Discretization of Continuous Attributes (Extended Abstract). In: Lavrac, N. and S. Wrobel (Eds.), Machine Learning: ECML-95. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 912: 335-338.
    CrossRef    
  19. Sweetlin, J.D., H.K. Nehemiah and A. Kannan, 2016. Patient-specific model based segmentation of lung computed tomographic images. J. Inform. Sci. Eng., 32(5): 1373-1394.
    Direct Link
  20. Yang, Y. and G.I. Webb, 2009. Discretization for naive-bayes learning: Managing discretization bias and variance. Mach. Learn., 74(1): 39-74.
    CrossRef    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved