Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


A Robust Scalable Model Using Hybrid Approach for the Detection of the Projected Outliers

1H. Sadawarti, 2G.S. Kalra and 3Kamal Malik
1RIMTIET, Punjab Technical University
2Lovely Professional University, Punjab
3MMICT and BM, MMU, Mullana, Haryana, India
Research Journal of Applied Sciences, Engineering and Technology  2016  6:642-649
http://dx.doi.org/10.19026/rjaset.12.2712  |  © The Author(s) 2016
Received: July ‎7, ‎2015  |  Accepted: August ‎22, ‎2015  |  Published: March 15, 2016

Abstract

The abnormal and anomalous observations even in the advanced technological era proves to be the biggest jolt to the concerned industry. To reduce and eliminate the outliers from the massive data streams, it is important to accurately highlight them from the higher dimensional data which is itself very challenging. In this study, a Scalable outlier detection model is proposed which is robust enough to resist and detect the projected outliers that are lying at some lower dimensional subspaces. This model exploits the problem of curse of dimensionality which is very frequent in large data streams and massive datasets. Rapid distance and density based approaches are used and then the probability density is measured by Gaussian Mixture Model. Baye's Probability is applied to the final observations so as confirm them as the projected outliers.

Keywords:

GMM, KDD, projected outliers,


References

  1. Aggarwal, C.C. and P.S. Yu, 2000. Finding generalized projected clusters in higher dimensional spaces. Proceeding of the ACM SIGMOID International Conference on Management of Data, pp: 70-81.
  2. Agrawal, R., J. Gehrke, D. Gunopulos and P. Raghavan, 1998. Automatic subspace clustering of high dimensional data for data mining applications. Proceeding of the ACM SIGMOD International Conference on Management of Data, Seattle, WA, pp: 94-105.
    CrossRef    
  3. Angiulli, F., S. Basta and C. Pizzuti, 2006. Distance-based detection and prediction of outliers. IEEE T. Knowl. Data En., 18(2): 145-160.
    CrossRef    
  4. Atkinson, A.C., 1994. Fast very robust methods for the detection of multiple outliers. J. Am. Stat. Assoc., 89(428): 1329-1339.
    CrossRef    
  5. Breunig, M.M., H.P. Kriegel, R.T. Ng and J. Sander, 2000. LOF: Identifying density-based local outliers. Proceeding of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD '00). ACM, New York, USA, pp: 93-104.
    CrossRef    
  6. Danuser, G. and M. Stricker, 1998. Parametric model fitting: From inlier characterization to outlier detection. IEEE T. Pattern Anal., 20(3): 263-280.
    CrossRef    
  7. Day, N.E., 1969. Estimating the components of a mixture of normal distributions. Biometrica, 56(3): 463-474.
    CrossRef    
  8. Dempster, A.P., N.M. Liard and D.B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B Met., 39: 1-38.
  9. Duda, R.O. and P.E. Hart, 1973. Pattern Classification and Scene Analysis. Wiley, New York.
  10. Ester, M., H.P. Kriegel, J. Sander and X. Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceeding of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, Portland, OR, pp: 226-231.
  11. Fidler, S., D. Skocaj and A. Leonardis, 2006. Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE T. Pattern Anal., 28(3): 337-350.
    CrossRef    PMid:16526421    
  12. Hasselblad, V., 1966. Estimation of parameters for a mixture of normal distributions. Technometrics, 8(3): 431-444.
    CrossRef    
  13. Hinneburg, A. and D.A. Keim, 1998. An efficient approach to clustering in large multimedia databases with noise. Proceeding of the 4th International Conference on Knowledge Discovery and Data Mining. New York City, pp: 58-65.
  14. Knorr, E.M. and R.T. Ng, 1998. Algorithms for mining distance based outliers in large datasets. Proceeding of the 24th International Conference on Very Large Data Bases (VLDB, 1998), pp: 392-403.
  15. Lazarevic, A., L. Ertoz, A. Ozgur, J. Srivastava and V. Kumar, 2013. A comparative study of anomaly detection schemes in network intrusion detection. Proceeding of the 3rd SIAM Conference on Data Mining.
  16. Ramaswamy, S., R. Rastogi and K. Shim, 2000. Efficient algorithms for mining outliers from large data sets. Proceeding of the ACM SIGMOD International Conference on Management of Data (SIGMOD '00), pp: 427-438.
    CrossRef    PMid:10870986    
  17. Rocke, D.M. and D.L. Woodruff, 1996. Identification of outliers in multivariate data. J. Am. Stat. Assoc., 91(435): 1047-1061.
    CrossRef    
  18. Roeder, K. and L. Wasserman, 1997. Practical Bayesian density estimation using mixtures of normals. J. Am. Stat. Assoc., 92(439): 894-902.
    CrossRef    
  19. Rousseeuw, P.J. and A.M. Leory, 1987. Robust Regression and Outlier Detection. John Wiley and Sons, NY.
    CrossRef    
  20. Rousseeuw, P.J. and K. Van Driessen, 1999. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3): 212-223.
    CrossRef    
  21. Takeuchi, J. and K. Yamanishi, 2006. A unifying framework for detecting outliers and change points from time series. IEEE T. Knowl. Data En., 18(4): 482-492.
    CrossRef    
  22. Wolfe, J.H., 1970. Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res., 5(3): 329-350.
    CrossRef    PMid:26812701    
  23. Zhang, J., 2009. Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy. Ph.D. Thesis, Dalhousie University.

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved