Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Web Page Classification Using SVM and FURIA

1P. Madhubala and 2K. Murugesan
1Department of Computer Science and Engineering, Tagore Institute of Engineering and Technology, Tamilnadu, India
2Department of Electronics and Communication Engineering, Sree Sastha Institute of Engineering and Technology, Tamilnadu, India
Research Journal of Applied Sciences, Engineering and Technology  2015  7:512-518
http://dx.doi.org/10.19026/rjaset.9.1434  |  © The Author(s) 2015
Received: September ‎24, ‎2014  |  Accepted: October ‎24, ‎2014  |  Published: March 05, 2015

Abstract

Text Classification classifies a document, under a predefined category. Mostly, an automatic text classification is an important application taken as a research topic, since the inception of digital documents. In this study, Hypernyms, superordinate words are identified in web and clubbed with entailment rule acquisition. Available tree of hyponym words in the document has been created and used with dependency tree. Features extraction is performed with weighted Term Frequency-Inverse Document Frequency (TF-IDF) where the weight of the word can be computed based on the number of hyponyms present in the radix tree. Performance evaluation is done using Support Vector Machine (SVM) classifier and Fuzzy Unordered Rule Induction Algorithm (FURIA) classifier.

Keywords:

Hypernym , hyponym, radix tree, Support Vector Machine (SVM) and Fuzzy Unordered Rule Induction Algorithm (FURIA), Term Frequency-Inverse Document Frequency (TF-IDF),


References

  1. Bai, P. and J. Li, 2009. The improved naive Bayesian WEB text classification algorithm. Proceeding of International Symposium on Computer Network and Multimedia Technology (CNMT, 2009), pp: 1-4.
    CrossRef    
  2. Bo, S., S. Qiurui, C. Zhong and F. Zengmei, 2009. A study on automatic web pages categorization. Proceeding of IEEE International Advance Computing Conference (IACC, 2009), pp: 1423-1427.
    CrossRef    
  3. Dagan, I., B. Dolan, B. Magnini, D. Roth, I. Dagan, B. Dolan and P. Pantel, 2010. Recognizing textual entailment: Rational, evaluation and approaches-erratum. Nat. Lang. Eng., 16(1): 105.
    CrossRef    
  4. Gasparovica, M. and L. Aleksejeva, 2011. Using fuzzy unordered rule induction algorithm for cancer data classification. Proceeding of 17th International Conference on Soft Computing, MENDEL 2011. Czech Republic, Brno, pp: 141-147.
  5. Hazman, M., S.R. El-Beltagy and A. Rafea, 2011. Survey of ontology learning approaches. Int. J. Comput. Appl., 22(9).
    CrossRef    
  6. Inyaem, U., P. Meesad and C. Haruechaiyasak, 2009. Named-entity techniques for terrorism event extraction and classification. Proceeding of 8th International Symposium on Natural Language Processing (SNLP'09), pp: 175-179.
    CrossRef    
  7. Jakkula, V., 2006. Tutorial on support vector machine (svm). School of EECS, Washington State University.
  8. Jotheeswaran, J. and Y. Kumaraswamy, 2013. Opinion mining using decision tree based feature selection through Manhattan hierarchical cluster measure. J. Theor. Appl. Inform. Technol., 58(1).
  9. Kan, M.Y. and H.O.N. Thi, 2005. Fast webpage classification using URL features. Proceeding of the 14th ACM International Conference on Information and Knowledge Management, pp: 325-326.
    CrossRef    
  10. Khan, A., B. Baharudin and K. Khan, 2010. Semantic based features selection and weighting method for text classification. Proceeding of the 2010 International Symposium in Information Technology (ITSim), 2: 850-855.
    CrossRef    
  11. Koirala, C. and K. Rasheed, 2008. Comparison of the effects of morphological and ontological information on text categorization. Proceeding of the 7th International Conference on Machine Learning and Applications (ICMLA'08), pp: 783-786.
    CrossRef    
  12. Leis, V., A. Kemper and T. Neumann, 2013. The adaptive radix tree: ARTful indexing for main-memory databases. Proceeding of the IEEE 29th International Conference on Data Engineering (ICDE, 2013), pp: 38-49.
  13. Liu, J., G. Wang and Z. Jiang, 2009. Research on chinese ontology instance extension based on SVM. Proceeding of the International Symposium on Intelligent Ubiquitous Computing and Education, pp: 564-568.
    CrossRef    
  14. Luong, H.P., S. Gauch and Q. Wang, 2009. Ontology learning through focused crawling and information extraction. Proceeding of the International Conference on Knowledge and Systems Engineering (KSE'09), pp: 106-112.
    CrossRef    
  15. Luts, J., F. Ojeda, R. Van de Plas, B. De Moor, S. Van Huffel and J.A. Suykens, 2010. A tutorial on support vector machine-based methods for classification problems in chemometrics. Anal. Chim. Acta, 665(2): 129-145.
    CrossRef    PMid:20417323    
  16. Maedche, A. and S. Staab, 2004. Ontology Learning. In: Stab, S., et al., (Eds.), Handbook on Ontologies. Springer-Verlag, Berlin, Heidelberg, pp: 173-190.
    CrossRef    
  17. Nasuti, F.W., 2000. Knowledge acquisition using multiple domain experts in the design and development of an expert system for disaster recovery planning. Ph.D. Thesis, Nova Southeastern University.
  18. Rahman, M.M. and D.N. Davis, 2012. Fuzzy unordered rules induction algorithm used as missing value imputation methods for K-mean clustering on real cardiovascular data. Proceeding of the World Congress on Engineering, Vol. 1.
  19. Ranwez, V., S. Ranwez and S. Janaqi, 2012. Subontology extraction using hyponym and hypernym closure on is-a directed acyclic graph. IEEE T. Knowl. Data En., 24(12): 2288-2300.
    CrossRef    
  20. Rios-Alvarado, A.B., I. Lopez-Arevalo and V. Sosa-Sosa, 2011. Discovering hypernyms using linguistic patterns on web search. Proceeding of the 7th International Conference on Next Generation Web Services Practices (NWeSP), pp: 302-307.
    CrossRef    
  21. Siragusa, E., D. Weese and K. Reinert, 2013. Scalable string similarity search/join with approximate seeds and multiple backtracking. Proceeding of the Joint EDBT/ICDT 2013 Workshops, pp: 370-374.
    CrossRef    
  22. Soucy, P. and G.W. Mineau, 2005. Beyond TFIDF weighting for text categorization in the vector space model. Proceeding of the 19th International Joint Conference on Artificial Intelligence (IJCAI, 2005), pp: 1130-1135.
    PMCid:PMC4286873    
  23. Soumya, S. and H. Swathi, 2013. Automatic repeated rule acquisition from similar web sites using rule ontology. Int. J. Comput. Appl., 66(6): 17-22.
  24. Valêncio, C.R., F.T. Oyama, P.S. Neto, A.C. Colombini, A.M. Cansian, R.C.G. de Souza and P.L.P. Corrêa, 2012. MR-Radix: A multi-relational data mining algorithm. Human-Centric Comput. Inform. Sci., 2(1): 1-17.
  25. Vigneshwari, S. and M. Aramudhan, 2012. A novel approach for personalizing the web using user profiling ontologies. Proceeding of 4th International Conference on Advanced Computing (ICoAC, 2012), pp: 1-4.
    CrossRef    
  26. Wang, X. and Q. Lu, 2011. Ontology auto-extension based on improved SVM algorithm. Proceeding of the International Conference on E-Business and E-Government (ICEE, 2011), pp: 1-4.
    CrossRef    
  27. Xu, Z., F. Yan, J. Qin and H. Zhu, 2011. A web page classification algorithm based on link information. Proceeding of the 10th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES, 2011), pp: 82-86.
    CrossRef    
  28. Yildiz, T. and S. Yildirim, 2012. Association rule based acquisition of hyponym and hypernym relation from a Turkish corpus. Proceeding of the International Symposium on Innovations in Intelligent Systems and Applications (INISTA, 2012), pp: 1-5.
    CrossRef    
  29. Yoo, K., 2011. SVM-based knowledge topic identification toward the autonomous knowledge acquisition. Proceeding of the IEEE 9th International Symposium on Applied Machine Intelligence and Informatics (SAMI), pp: 149-154.
    CrossRef    
  30. Zhang, W., T. Yoshida and X. Tang, 2008. TFIDF, LSI and multi-word in information retrieval and text categorization. Proceeding of the IEEE International Conference on Systems, Man and Cybernetics (SMC, 2008), pp: 108-113.

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved