Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Designing A Method for Alcohol Consumption Prediction Based on Clustering and Support Vector Machines

Mendoza-Palechor Fabio, De la Hoz-Manotas Alexis, Morales-Ortega Roberto, Martinez-Palacio Ubaldo, Diaz-Martinez Jorge and Combita-Nino Harold
Department of Computer Science and Electronics, Universidad de la Costa, Barranquilla, Colombia
Research Journal of Applied Sciences, Engineering and Technology  2017  4:146-154
http://dx.doi.org/10.19026/rjaset.14.4158  |  © The Author(s) 2017
Received: November 21, 2016  |  Accepted: February 14, 2017  |  Published: April 15, 2017

Abstract

In this study, an implementation of several data mining techniques is presented, including decision trees, Support Vector Machines (SVM), Bayesian Networks and K-Nearest Neighbor and their comparison using different evaluation metrics such as True Positive Rate (TpRate), False Positive Rate (FpRate) and Recall, with the dataset “STUDENT ALCOHOL CONSUMPTION”, that provides information of alcohol consumption in teenagers in Portugal. High alcohol consumption rate in teenagers in society, high schoolers and college students, has become a social problem with alarming data showing they start consuming alcohol between 10 and 14 years and this obviously has a huge impact in their behavior, especially with situations such as binge drinking. At the end of the study, the results found show that Support Vector Machines (SVM) have a better accuracy rate than other techniques used and corroborate that the proposed method it is quite efficient and highly precise for detection of students consuming alcohol, improving the results obtained in previous similar studies.

Keywords:

Alcohol consumption, bayesian networks, data mining, Decision Trees (DT), K-Nearest Neighbors (KNN), Support Vector Machines (SVM),


References

  1. Jain, A.K., M.N. Murty and P.J. Flynn, 1999. Data clustering: A review. ACM Comput. Surv., 31(3): 264-323.
    CrossRef    
  2. Vapnik, V.N., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York.
    CrossRef    PMid:8555380    
  3. Xu, R. and D. Wunsch, 2005. Survey of clustering algorithms. IEEE T. Neural Networ., 16(3): 645-678.
    CrossRef    PMid:15940994    
  4. Bakhtiarizadeh, M.R., M. Moradi-Shahrbabak, M. Ebrahimi and E. Ebrahimie, 2014. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. J. Theor. Biol., 356: 213-222.
    Direct Link
  5. Bi, J., J. Sun, Y. Wu, H. Tennen and S. Armeli, 2013. A machine learning approach to college drinking prediction and risk factor identification. ACM T. Intell. Syst. Technol., 4(4).
    Direct Link
  6. Cortez, P. and A. Silva, 2008. Using data mining to predict secondary school student performance. In: Brito, A. and J. Teixeira (Eds.), Proceeding of 5th Future Business Technology Conference (FUBUTEC, 2008). Porto, Portugal, April, pp: 5-12.
    Direct Link
  7. Cowell, R.G., P. Dawid, S.L. Lauritzen and D.J. Spiegelhalter, 1999. Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. Springer-Verlag, New York, pp: 324.
    Direct Link
  8. Cristianini, N. and J. Shawe-Taylor, 2000. An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, New York.
    Direct Link
  9. Crutzen, R., P.J. Giabbanelli, A. Jander, L. Mercken and H. de Vries, 2015. Identifying binge drinkers based on parenting dimensions and alcohol-specific parenting practices: Building classifiers on adolescent-parent paired data. BMC Public Health, 15(1): 747.
    Direct Link
  10. Dixon, J.R. and V.M. Pastor, 1970. Introducción a la Probabilidad: Texto Programado. 1st Edn., Editorial Limusa, Mexico, D.F.
  11. Edwards, W., 1998. Hailfinder. Tools for and experiences with Bayesian normative modeling. Am. Psychol., 53(4): 416-428.
    Direct Link
  12. Edwards, W. and B. Fasolo, 2001. Decision technology. Annu. Rev. Psychol., 52(1): 581-606.
    Direct Link
  13. Fix, E. and J.L. Hodges Jr, 1951. Discriminatory analysis-nonparametric discrimination: Consistency properties. Air Technical Index, California University, Berkeley.
    Direct Link
  14. García, E.G., R.J. López, J.J.M. Moreno, A.S. Abad, B.C. Blasco and A.P. Pol, 2009. La metodología del Data Mining. Una aplicación al consumo de alcohol en adolescentes. Adicciones, 21(1): 65-80.
    Direct Link
  15. Gutiérrez, J.F.M. and L.F.M. Velandia, 2011. Pronóstico de Incumplimientos de Pago Mediante Máquinas de Vectores de Soporte: Una Aproximación Inicial a La Gestión Del Riesgo de crédito. Banco de la República, Bogota´, No. 677.
    Direct Link
  16. Han, J. and M. Kamber, 2001. Data Mining: Concepts and Techniques. 2nd Edn., Morgan Kaufmann Publishers, San Francisco.
  17. Harary, F., 1969. Graph Theory. Addison-Wesley Publishing Company Inc., Reading, MA.
    Direct Link
  18. Hastie, T., R. Tibshirani and J.H. Friedman, 2001. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.
  19. Kecman, V., 2001. Learning and Soft Computing Support Vector Machines, Neural Networks, and Fuzzy Logic Models. MIT Press, Cambridge, Mass.
    Direct Link
  20. Kim, K., 2016. A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree. Pattern Recogn., 60: 157-163.
    Direct Link
  21. Martínez, F., M.C. Díaz, M.T. Martín, V.M. Rivas and L.A. Ureña, 2003. Aplicación de redes neuronales y redes bayesianas en la detección de multipalabras para tareas IR. Proceeding of the Artculo Presentado en las II Jornadas de Tratamiento y Recuperacin de la Informacin, Madrid.
    Direct Link
  22. Mendoza-Palechor, F.E., P.P. Ariza-Colpas, J.A. Sepulveda-Ojeda, A., De-La-Hoz-Manotas and M. Piñeres Melo, 2016. Fertility Analysis Method Based on Supervised and Unsupervised Data Mining Techniques. Int. J. Appl. Eng. Res., 11(21): 10374-10379.
    Direct Link
  23. Montaño, J.J., E. Gervilla, B. Cajal and A. Palmer, 2014. Data mining classification techniques: An application to tobacco consumption in teenagers. An. Psicol., 30(2): 633-641.
    Direct Link
  24. Moscovitz, L.J. and P.R. Rengifo, 2010. Al interior de una máquina de soporte vectorial. Rev. Cienc., 14: 73 85.
    Direct Link
  25. Mucherino, A., P.J. Papajorgji and P.M. Pardalos, 2009. Data Mining in Agriculture. Vol. 34, Springer-Verlag, New York.
    Direct Link
  26. Nadkarni, S. and P.P. Shenoy, 2001. A Bayesian network approach to making inferences in causal maps. Eur. J. Oper. Res., 128(3): 479-498.
    Direct Link
  27. Nadkarni, S. and P.P. Shenoy, 2004. A causal mapping approach to constructing Bayesian networks. Decis. Support Syst., 38(2): 259-281.
    Direct Link
  28. Pagnotta, F. and H.M. Amran, 2016. Using data mining to predict secondary school student alcohol consumption. Department of Computer Science, University of Camerino.
  29. Pang, R., A. Baretto, H. Kautz and J. Luo, 2015. Monitoring adolescent alcohol use via multimodal analysis in social multimedia. Proceeding of the IEEE International Conference on Big Data (Big Data), pp: 1509-1518.
    Direct Link
  30. Pearl, J., 2001. Bayesian networks, causal inference and knowledge discovery. Technical Report, Computer Science Department, Cognitive Systems Laboratory, University of California, Los Angeles.
  31. Ríos, S., 1995. Modelización. Alianza, Madrid.
  32. Rodríguez, J.E.R., E.A.R. Blanco and R.O.F. Camacho, 2013. Clasificación de datos usando el método k-nn. Vínculos, 4(1): 4-18.
  33. Ronald, G., 1988. Graph Theory. Benjamin/Cummings Publishing Co., Menlo Park, CA.
  34. Sánchez, A.S., F.J. Iglesias-Rodríguez, P.R. Fernández and F.J. de Cos Juez, 2016. Applying the K-nearest neighbor technique to the classification of workers according to their risk of suffering musculoskeletal disorders. Int. J. Ind. Ergonom., 52: 92-99.
    Direct Link
  35. Spirtes, P., C. Glymour and R. Scheines, 2000. Causation, Prediction, and Search. 2nd Edn., MIT Press, Cambridge.
  36. Vapnik, V.N. and V. Vapnik, 1998. Statistical Learning Theory. Vol. 1. Wiley, New York.
  37. Villalón, M. and C. Cuellar, 2013. Adolescentes y consumo nocivo de alcohol. Chile 2009: Mirando a las políticas públicas. Rev. Méd. Chile, 141(5): 644-651.
    Direct Link
  38. Zhang, M.L. and Z.H. Zhou, 2009. Multi-instance clustering with applications to multi-instance prediction. Appl. Intell., 31(1): 47-68.
    Direct Link
  39. Zuba, M., J. Gilbert, Y. Wu, J. Bi, H. Tennen and S. Armeli, 2012. 1-norm support vector machine for college drinking risk factor identification. Proceeding of the 2nd ACM SIGHIT International Health Informatics Symposium, pp: 651-660.
    Direct Link

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved