Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


A Review of Outlier Prediction Techniques in Data Mining

1S. Kannan and 2K. Somasundaram
1Department of Computer Science and Engineering, Karpagam University, Coimbatore, Tamilnadu 641021, India
2Department of CSE, Vel Tech High Tech Dr RR and Dr SR Engineering College, Avadi, Chennai-60062, India
Research Journal of Applied Sciences, Engineering and Technology   2015  9:1021-1028
http://dx.doi.org/10.19026/rjaset.10.1869  |  © The Author(s) 2015
Received: February ‎27, ‎2015  |  Accepted: March ‎25, ‎2015  |  Published: July 25, 2015

Abstract

The main objective of this review is that to predict the outliers in data mining. In general, the data mining is a process of applying various techniques to extract useful patterns or models from the available data. It plays a vital role to choose, explore and model high dimensional data. Outlier detection refers a substantial research problem in the domain of data mining those objectives to uncover objects which exhibit significantly different, exceptional and inconsistent from rest of the data. The outlier potential sources can be noise and errors, events and malicious attack in the network. The main challenges involved in the outlier detection with high complexity, size and different types of datasets, are how to catch similar outliers as a group by using clustering-based approach. The outlier or noise available in the clustered data is accurately removed and retrieves an efficient high dimensional data. Nowadays, the classification and clustering techniques for outlier prediction are applied in various fields like bioinformatics, natural language processing, military application, geographical domains etc. This study surveys various data classification and data clustering techniques in order to identify the optimal techniques, which provides better outlier predicted data detection. Moreover, the comparison between the various classification and clustering techniques for outlier prediction are illustrated.

Keywords:

Data classification, data clustering, data mining, high dimensional data, outlier detection,


References

  1. Abdo, A., B. Chen, C. Mueller, N. Salim and P. Willett, 2010. Ligand-based virtual screening using bayesian networks. J. Chem. Inf. Model., 50(6): 1012-1020.
    CrossRef    PMid:20504032    
  2. Abdo, A., V. Leclère, P. Jacques, N. Salim and M. Pupin, 2014. Prediction of new bioactive molecules using a Bayesian belief network. J. Chem. Inf. Model., 54(1): 30-36.
    CrossRef    PMid:24392938    
  3. Bal, M., M.F. Amasyali, H. Sever, G. Kose and A. Demirhan, 2014. Performance evaluation of the machine learning algorithms used in inference mechanism of a medical decision support system. Sci. World J., 2014(2014): 15, Article ID 137896.
  4. Bhosale, S.V., 2014. Holy grail of outlier detection technique: A macro level take on the state of the art. Int. J. Comput. Sci. Inform. Technol., 5(4): 5872-5874.
  5. Chandola, V., A. Banerjee and V. Kumar, 2007. Outlier detection: A survey. ACM Comput. Surv., pp: 1-83.
  6. Chandore, P. and P. Chatur, 2013. Hybrid approach for outlier detection over wireless sensor network real time data. Int. J. Comput. Sci. Appl., 6(2): 76-81.
  7. Dabrowski, J.J. and J.P. De Villiers, 2015. Maritime piracy situation modelling with dynamic bayesian networks. Inform. Fusion, 23: 116-130.
    CrossRef    
  8. Fan, H., O.R. Zaïane, A. Foss and J. Wu, 2006. A nonparametric outlier detection for effectively discovering top-n outliers from engineering data. In: Ng, W.K., M. Kitsuregawa and J. Li (Eds.), PAKDD, 2006. LNAI 3918, Springer-Verlag, Berlin, Heidelberg, pp: 557-566.
    CrossRef    
  9. Gupta, M., J. Gao, C. Aggarwal and J. Han, 2014. Outlier detection for temporal data. Synthesis Lect. Data Mining Knowl. Discov., 5(1): 1-129.
    CrossRef    
  10. Hodge, V.J. and J. Austin, 2004. A survey of outlier detection methodologies. Artificial Intell. Rev., 22(2): 85-126.
    CrossRef    
  11. Huang, J.Z., M.K. Ng, R. Hongqiang and L. Zichen, 2005. Automated variable weighting in k-means type clustering. IEEE T. Pattern Anal., 27(5): 657-668.
    CrossRef    PMid:15875789    
  12. Jose, A., S. Ravi and M. Sambath, 2014. Brain tumor segmentation using k-means clustering and fuzzy c-means algorithms and its area calculation. Brain, 2(3): 3496-3501.
  13. Koteeswaran, S. and P.V. Janet, 2012. A review on clustering and outlier analysis techniques in data mining. Am. J. Appl. Sci., 9(2): 254-258.
    CrossRef    
  14. Koupaie, H.M., S. Ibrahim and J. Hosseinkhani, 2013. Outlier detection in stream data by clustering method. Int. J. Adv. Comput. Sci. Inform. Technol., 2(3): 25-34.
  15. Kumar, M., 2014. Evaluating the existing solution of outlier detection in WSN system. Int. J. Adv. Res. IT Eng., 3(6): 16-25.
  16. Lu, S. and S.L. Braunstein, 2014. Quantum decision tree classifier. Quantum Inf. Process., 13(3): 757-770.
    CrossRef    
  17. Luo, L. and L. Li, 2014. Defining and evaluating classification algorithm for high-dimensional data based on latent topics. PloS One, 9(1): 1-9.
    CrossRef    PMid:24416136 PMCid:PMC3886981    
  18. Mosavi, A., 2010. Multiple criteria decision-making preprocessing using data mining tools. Int. J. Comput. Sci. Issues (IJCSI), 7(2): 26-34.
  19. Peter, T., Z. Michael and U. Stan, 2013. Value-at-risk support vector machine: Stability to outliers. J. Comb. Optim., 28: 218-232.
  20. Rahmani, M.K.I., N. Pal and K. Arora, 2014. Clustering of image data using k-means and fuzzy k-means. Int. J. Adv. Comput. Sci. Appl. (IJACSA), 5(7): 160-163.
  21. Romero, C. and S. Ventura 2010. Educational data mining: A review of the state of the art. IEEE T. Syst. Man Cy. C, 40(6): 601-618.
    CrossRef    
  22. Saini, A., K.K. Sharma and S. Dalal, 2014. A survey on outlier detection in WSN. Int. J. Res. Aspects Eng. Manage., 1(2): 69-72.
  23. Shukla, D.S., A.C. Pandey and A. Kulhari, 2014. Outlier detection: A survey on techniques of WSNs involving event and error based outliers. Proceeding of Innovative Applications of Computational Intelligence on Power, Energy and Controls with their impact on Humanity (CIPECH), pp: 113-116.
  24. Singh, G. and V. Kumar, 2013. An efficient clustering and distance based approach for outlier detection. Int. J. Comput. Trends. Technol. (IJCTT), 4(7): 2067-2072.
  25. Singh, K. and S. Upadhyaya, 2012. Outlier detection: Applications and techniques. Int. J. Comput. Sci. Issues (IJCSI), 9(1): 307-323.
  26. Su, X., Y. Lan, R. Wan and Y. Qin 2009. A fast incremental clustering algorithm. Proceeding of the International Symposium on Information Processing (ISIP’09), pp: 175-178.
    PMCid:PMC2848967    
  27. Suphakit, N., S. Jatsada, N. Ekkachai and W. Supachanun, 2013. Using of jaccard coefficient for keywords similarity. Proceeding of the International Multi Conference of Engineers and Computer Scientists, Vol. 1.
  28. Tien Bui, D., B. Pradhan, O. Lofman and I. Revhaug, 2012. Landslide susceptibility assessment in vietnam using support vector machines, decision tree and naive bayes models. Math. Probl. Eng., 2012(2012): 26, Article ID 974638.
  29. Torres, G.J., R.B. Basnet, A.H. Sung, S. Mukkamala and B.M. Ribeiro, 2009. A similarity measure for clustering and its applications. Int. J. Electr. Comput. Eng. Syst. (IJECES), 3(3): 164-170.
  30. Williams, G., R. Baxter, H. He, S. Hawkins and L. Gu 2002. A comparative study of RNN for outlier detection in data mining. Proceeding of IEEE 13th International Conference on Data Mining, pp: 709-709.
    CrossRef    
  31. Yin, S., X. Gao, H.R. Karimi and X. Zhu, 2014. Study on support vector machine-based fault detection in Tennessee eastman process. Abstr. Appl. Anal., 2014(2014): 8, Article ID 836895.
  32. Zhang, Y., N.A. Hamm, N. Meratnia, A. Stein, M. van de Voort and P.J. Havinga, 2012. Statistics-based outlier detection for wireless sensor networks. Int. J. Geogr. Inf. Sci., 26(8): 1373-1392.
    CrossRef    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved